Bruton&#39;s tyrosine kinase as anti-cancer drug target

ABSTRACT

Receptor protein tyrosine kinases (RPTKs) transmit extracellular signals across the plasma membrane to cytosolic proteins, stimulating the formation of complexes that regulate key cellular functions. Over half of the 90 tyrosine kinases have been implicated in human cancers and are for this reason considered highly promising drug targets. To gain insight into the tyrosine kinases that contribute to breast cancer related cellular mechanisms, we carried out a large-scale loss-of-function analysis of the tyrosine kinases, using RNA interference, in the clinically relevant Erb-B2 positive, BT474 breast cancer cell line. The cytosolic, non-receptor tyrosine kinase Bruton&#39;s tyrosine kinase (BTK), which has been extensively studied for its role in B cell development, was among those tyrosine kinase genes required for BT474 breast cancer cell survival. The BTK protein identified was an alternative form containing an amino-terminal extension. This alternative form of the Btk message is also present in tumorigenic breast cells at significantly higher levels than in normal breast cells.

FIELD

Embodiments of the invention find application in the field of cancer therapy.

BACKGROUND

Protein tyrosine kinases (PTKs) mediate the reversible process of tyrosine phosphorylation, providing the signals that activate or block signal transduction pathways that govern cell survival decisions and as such are tightly regulated. Genes that regulate extracellular growth, differentiation and developmental signals are commonly mutated in cancers. Perhaps it is not surprising therefore that PTKs comprise the largest group of dominant oncogenes. Thirty of the 58 receptor protein tyrosine kinases (RPTKs) have been implicated in human cancer (Blume-Jensen and Hunter, 2001). Less than half of the cytoplasmic protein tyrosine kinases have been associated with tumorigenesis, due not to a less critical role in signal transduction regulation, however, but from an experimental bias that has focused on viral counterparts to gain insight into potential transforming mechanisms (Blume-Jensen and Hunter, 2001).

In recent years there has been a surge in efforts to discover genes critical to cancer signaling pathways that when inhibited would provide specific anti-cancer therapies (Lu and Chu, 2008) (Sabbah et al., 2008). Trastuzumab, (Herceptin), a humanized monoclonal antibody that specifically inhibits the HER2/neu/ErbB-2 (hereafter referred to as ErbB-2) receptor tyrosine kinase, which is amplified and/or over-expressed in 25-30% of metastatic breast cancers, was the first targeted therapy to be approved by the FDA. As a single-agent monotherapy, however, the primary response rate to trastuzumab is low, (12% to 34%) and the rate of primary resistance high, between 66% to 88% (Nahta and Esteva, 2006). Notably, however, the time to disease progression, response rate and overall survival increase when trastuzumab is used in combination with paclitaxel or docetaxel (Nahta and Esteva, 2006). Indeed, recent successes in targeting molecules integral to survival pathways in combination with traditional chemotherapeutics has led to significant efforts to identify new drug targets that sensitize the breast cancer cell towards cell death (MacKeigan et al., 2005); (Call et al., 2008). Such additional drug targets, specific to or over-expressed in breast cancer cells compared to normal tissues, and known to be functionally relevant, are still needed, as are cancer-specific markers for use in detecting or diagnosing cancer.

SUMMARY

In one embodiment, the instant invention provides a method of treating cancer, comprising:

-   -   a) providing a subject with breast cancer cells and an inhibitor         of a gene encoding a cytoplasmic tyrosine kinase, and     -   b) treating said subject with said inhibitor

In a preferred embodiment, the cytoplasmic tyrosine kinase is a member of the Tec family of cytoplasmic tyrosine kinases and, in a more preferred embodiment, the cytoplasmic tyrosine kinase is Bruton's Tyrosine Kinase.

In another embodiment, the cytoplasmic tyrosine kinase is a variant of Bruton's Tyrosine Kinase comprising an amino-terminal extension. The amino acid sequence of the variant is SEQ ID NO. 1. The amino-terminal extension is SEQ ID NO 2. In one embodiment the extension comprises an additional 34 amino acids.

In one embodiment, the method of treating cancer comprises treating with an inhibitor that comprises an interfering RNA. Preferably, the treatment with the RNA results in reduced proliferation of the breast cancer cells.

In one embodiment, the interfering RNA is SEQ ID NO. 3.

In one embodiment, the instant invention provides a method of diagnosing cancer, comprising:

-   -   a) providing cells suspected to be breast cancer cells and a         ligand capable of binding to a variant of Bruton's Tyrosine         Kinase, said variant comprising an amino-terminal extension;     -   b) contacting said cells with said ligand under conditions         wherein said variant is detected.

In one embodiment, the amino-terminal extension of the variant used to diagnose cancer comprises an additional 34 amino acids. In a preferred embodiment the ligand used binds to a portion of the 34 amino acid extension. In one embodiment, the ligand comprises an antibody or a fragment thereof.

In another embodiment, the invention provides a composition comprising a variant of Bruton's Tyrosine Kinase comprising an amino-terminal extension, the extension preferably comprising an additional 34 amino acids.

In another embodiment, the invention provides a ligand-protein complex comprising an antibody bound to the variant of Bruton's Tyrosine Kinase.

In yet another embodiment, the invention provides a kit for diagnosing cancer, the kit comprising a ligand capable of binding to a variant of Bruton's Tyrosine Kinase and instructions for its use.

FIGURE LEGENDS

FIG. 1. An RNAi screen targeting tyrosine kinase genes in an ERBB2 (HER2/neu) positive breast cancer. BT474 breast cancer cells were transfected with 234 shRNA constructs targeting 83 protein tyrosine kinase genes. Three transfection mixes were produced for each shRNA and each was transfected into triplicate wells of BT474 cells for 96 hours. AlamarBlue was used to monitor cell proliferation and viability. The averages of the nine parallel cultures were calculated for each shRNA, normalized to transfection efficiency, expressed as % of the control shRNA (luciferase) and sorted on the basis of effect (top panel). z-scores were calculated using the following formula: (normalized sample value−normalized data set mean)/data set standard deviation. shRNAs that produced z-scores less than −1.1 are presented in a list (bottom panel).

FIG. 2. Btk knockdown in BT474 cells leads to increased apoptosis. (a) Brightfield image after 96 hr of siRNA knockdown of Btk in BT474 cells. (b-c) siRNA knockdown of Btk in BT474 cells (48 hr) results in increased cleaved caspase-3 (CC3) compared to a scrambled siRNA. (b) Apoptotic cells were calculated as a percentage of the total cellular population.

FIG. 3. An alternative form of the Btk transcript is present in BT474 breast cancer cells. (a) Nucleotide sequence 1-395 by from the published Btk sequence (accession #U13399) was aligned to the nucleotide sequence obtained from BT474 cells using 5′RACE. Identical sequence is highlighted in grey. The BT474 sequence obtained using 5′RACE translates into an additional 47 amino acid open reading frame (ORF) and contains two additional methionine codons, highlighted in green, that are in frame with the methionine start codon of the published Btk gene (highlighted in green and with an arrow). (b) Schematic representation showing the location of the Btk gene on the X-chromosome (b & c) and schematic representations comparing the location of the Btk-A and Btk-C exon 1.

FIG. 4. BTK-C yields an 80 kD BTK specific product. (a) A schematic representation showing the conserved domains of the BTK-A protein is compared to a schematic of the predicted BTKC protein. (b) Total lysate from BT474 cells and a malignant B-cell line positive for Btk-A (Namalwa) were subjected to immunoblotting with the BTK antibody. (c) 293FT cells were cotransfected with a BTK-A or BTK-C flag tag overexpression vector as well as a Btk shRNA or a control shRNA targeting the firefly luciferase gene. (d) siRNAs targeting, stably, overexpressed BTK-C in BT474 cells leads to efficient knockdown. BT474 cells were transfected with siRNAs and total lysate was used for immunoblotting with the BTK antibody.

FIG. 5. BTK-C is activated in BT474 cells. (a) In BT474 cells both forms of the over-expressed Btk-C proteins are phosphorylated on tyrosine residue 223, which becomes autophosphorylated after activation. Total lysate was prepared from BT474 cells containing the stably integrated Btk-A or Btk-C flag tag MarxIV vectors. Controls cells contain a stably integrated MarxIV flag tag vector encoding the beta-galactosidase gene (-gal) which retains its stop codon. Tyrosine-phosphorylated BTK was assessed by immunoprecipitation (IP) using anti-Flag and Western blot (WB) analysis using anti-BTK Phospho (pY223) and anti-BTK. (a) The specific BTK inhibitor LFM-A13 reduces phosphorylation of BTK. BT474 cells containing the stably integrated Btk-A or Btk-C flag tag MarxIV vectors were incubated with 100 μM LFM-A13 for 45 mins. Tyrosine-phosphorylated BTK was assessed by immunoprecipitation (IP) using anti-Flag and Western blot (WB) analysis using anti-BTK Phospho (pY223) and anti-BTK. (b & c) Inhibition of BTK autophosphorylation using LFM-A13 results in increased apoptosis. BT474 cells incubated with either 25 uM or 35 uM LFM-A13 results in increased cleaved caspase-3 (CC3) compared to control cells treated with DMSO. (c) Apoptotic cells were calculated as a percentage of the total cellular population.

FIG. 6. BTK protein is present in BT474 cellular cytoplasm. (a) Confocal immunofluorescence images of BTK in BT474 cells. Left column: Alexa 568 2o ab (no primary ab); middle column: BTK ab; right column: Flag Tag ab. (b) BT474 cells containing the stably integrated (left panels) control vector, (middle panels) BTK-A-flag vector, (right panels) BT-C-flag vector. Left panels: nuclei visualized with Hoechst; right panels: primary antibody (Btk or Flag) bound to secondary HRP conjugated antibody tagged with Alexa 568 tag.

FIG. 7. Btk-C is more abundant in breast cancer cells than in non-tumorigenic breast cells or a malignant B-cell line. (a) qPCR primers were designed to specifically target the Btk-C message and cDNA from the breast cancer cell lines BT474, MCF7, and MDA-MB-361, the non-tumorigenic breast cell lines MCF10a and HMEC, as well as, a malignant B-cell line was amplified using SYBR Green. The breast cancer cell lines BT474 and MCF7 had at least 4-fold more transcript compared to the non-tumorigenic breast cell lines MCF10a and HMEC and the malignant B-cell line Namalwa. Fold change was calculated using the delta, delta Ct method.

Supplemental FIG. 1. Reverse Transcriptase Polymerase Chain Reaction (RT-PCR). tRNA was isolated from the BT474 breast cancer cell line, a normal breast epithelial cell line (HMEC) and a Btk positive control cell line (Namalwa B-cells) and cDNA was amplified from each. Two distinct primer pairs were generated to target the Btk transcript at different regions of the mRNA (5′UTR and Btk internal) and cDNA from each cell type was used as substrate in a polymerase chain reactions (PCR). (a) PCR products were amplified from cDNAs isolated from B-cells for all primer pair combinations used. A product was amplified from BT474 cDNA using the Btk internal forward primer but not when the 5′UTR forward primer was used.

Supplemental FIG. 2. Reverse Transcriptase Polymerase Chain Reaction (RT-PCR). tRNA was isolated from several breast cancer cell lines (BT474, MCF7, MDA-MB-361), and two normal cell lines, HMEC and MCF10a and cDNA was amplified from each. Primer pairs were designed to specifically target the Btk-A message and cDNA from each cell type was used as substrate in a polymerase chain reactions (PCR). (a) A PCR product was amplified from cDNA isolated from the B-cell line using the Btk-A specific primers but no product was amplified for any of the breast cell lines tested.

DEFINITIONS

To facilitate the understanding of this invention a number of terms (set off in quotation marks in this Definitions section) are defined below. Terms defined herein (unless otherwise specified) have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. As used in this specification and its appended claims, terms such as “a”, “an” and “the” are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration, unless the context dictates otherwise. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.

The phrase “chosen from A, B, and C” as used herein, means selecting one or more of A, B, C.

As used herein, absent an express indication to the contrary, the term “or” when used in the expression “A or B,” where A and B refer to a composition, disease, product, etc., means one or the other, or both. As used herein, the term “comprising” when placed before the recitation of steps in a method means that the method encompasses one or more steps that are additional to those expressly recited, and that the additional one or more steps may be performed before, between, and/or after the recited steps. For example, a method comprising steps a, b, and c encompasses a method of steps a, b, x, and c, a method of steps a, b, c, and x, as well as a method of steps x, a, b, and c. Furthermore, the term “comprising” when placed before the recitation of steps in a method does not (although it may) require sequential performance of the listed steps, unless the context clearly dictates otherwise. For example, a method comprising steps a, b, and c encompasses, for example, a method of performing steps in the order of steps a, c, and b, the order of steps c, b, and a, and the order of steps c, a, and b, etc.

Unless otherwise indicated, all numbers expressing quantities of ingredients, properties such as molecular weights, reaction conditions, and so forth as used in the specification and claims are to be understood as being modified in all instances by the term “about.” Accordingly, unless indicated to the contrary, the numerical parameters in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the present invention. At the very least, and without limiting the application of the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters describing the broad scope of the invention are approximations, the numerical values in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains standard deviations that necessarily result from the errors found in the numerical value's testing measurements.

The term “not” when preceding, and made in reference to, any particularly named molecule (mRNA, etc.) or phenomenon (such as biological activity, biochemical activity, etc.) means that only the particularly named molecule or phenomenon is excluded.

The term “altering” and grammatical equivalents as used herein in reference to the level of any substance and/or phenomenon refers to an increase and/or decrease in the quantity of the substance and/or phenomenon, regardless of whether the quantity is determined objectively, and/or subjectively.

The terms “increase,” “elevate,” “raise,” and grammatical equivalents when used in reference to the level of a substance and/or phenomenon in a first sample relative to a second sample, mean that the quantity of the substance and/or phenomenon in the first sample is higher than in the second sample by any amount that is statistically significant using any art-accepted statistical method of analysis. In one embodiment, the increase may be determined subjectively, for example when a patient refers to their subjective perception of disease symptoms, such as pain, clarity of vision, etc. In another embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 10% greater than the quantity of the same substance and/or phenomenon in a second sample. In another embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 25% greater than the quantity of the same substance and/or phenomenon in a second sample. In yet another embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 50% greater than the quantity of the same substance and/or phenomenon in a second sample. In a further embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 75% greater than the quantity of the same substance and/or phenomenon in a second sample. In yet another embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 90% greater than the quantity of the same substance and/or phenomenon in a second sample. Alternatively, a difference may be expressed as an “n-fold” difference.

The terms “reduce,” “inhibit,” “diminish,” “suppress,” “decrease,” and grammatical equivalents when used in reference to the level of a substance and/or phenomenon in a first sample relative to a second sample, mean that the quantity of substance and/or phenomenon in the first sample is lower than in the second sample by any amount that is statistically significant using any art-accepted statistical method of analysis. In one embodiment, the reduction may be determined subjectively, for example when a patient refers to their subjective perception of disease symptoms, such as pain, clarity of vision, etc. In another embodiment, the quantity of substance and/or phenomenon in the first sample is at least 10% lower than the quantity of the same substance and/or phenomenon in a second sample. In another embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 25% lower than the quantity of the same substance and/or phenomenon in a second sample. In yet another embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 50% lower than the quantity of the same substance and/or phenomenon in a second sample. In a further embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 75% lower than the quantity of the same substance and/or phenomenon in a second sample. In yet another embodiment, the quantity of the substance and/or phenomenon in the first sample is at least 90% lower than the quantity of the same substance and/or phenomenon in a second sample. Alternatively, a difference may be expressed as an “n-fold” difference.

A number of terms herein relate to cancer. “Cancer” is intended herein to encompass all forms of abnormal or improperly regulated reproduction of cells in a subject. “Subject” and “patient” are used herein interchangeably, and a subject may be any mammal but is preferably a human. A “reference subject” herein refers to an individual who does not have cancer. The “reference subject” thereby provides a basis to which another cell (for example a cancer cell) can be compared.).

The growth of cancer cells (“growth” herein referring generally to cell division but also to the growth in size of masses of cells) is characteristically uncontrolled or inadequately controlled, as is the death (“apoptosis”) of such cells. Local accumulations of such cells result in a tumor. More broadly, and still denoting “tumors” herein are accumulations ranging from a cluster of lymphocytes at a site of infection to vascularized overgrowths, both benign and malignant. A “malignant” tumor (as opposed to a “benign” tumor) herein comprises cells that tend to migrate to nearby tissues, including cells that may travel through the circulatory system to invade or colonize tissues or organs at considerable remove from their site of origin in the “primary tumor,” so-called herein. Metastatic cells are adapted to penetrate blood vessel wells to enter (“intravasate”) and exit (“extravasate”) blood vessels. Tumors capable of releasing such cells are also referred to herein as “metastatic.” The term is used herein also to denote any cell in such a tumor that is capable of such travel, or that is en route, or that has established a foothold in a target tissue. For example, a metastatic breast cancer cell that has taken root in the lung is referred to herein as a “lung metastasis.” Metastatic cells may be identified herein by their respective sites of origin and destination, such as “breast-to-bone metastatic.” In the target tissue, a colony of metastatic cells can grow into a “secondary tumor,” so called herein.

Primary tumors are thought to derive from a benign or normal cell through a process referred to herein as “cancer progression.” According to this view, the transformation of a normal cell to a cancer cell requires changes (usually many of them) in the cell's biochemistry. The changes are reflected clinically as the disease progresses through stages. Even if a tumor is “clonogenic” (as used herein, an accumulation of the direct descendants of a parent cell), the biochemistry of the accumulating cells changes in successive generations, both because the expression of the genes (controlled by so-called “epigenetic” systems) of these cells becomes unstable and because the genomes themselves change. In normal somatic cells, the genome (that is, all the genes of an individual) is stored in the chromosomes of each cell (setting aside the mitochondrial genome). The number of copies of any particular gene is largely invariant from cell to cell. By contrast, “genomic instability” is characteristic of cancer progression. A genome in a cancer cell can gain (“genomic gain”) or lose (“genomic loss”) genes, typically because an extra copy of an entire chromosome appears (“trisomy”) or a region of a chromosome replicates itself (“genomic gain” or, in some cases, “genomic amplification”) or drops out when the cell divides. Thus, the “copy number” of a gene or a set of genes, largely invariant among normal cells, is likely to change in cancer cells (referred to herein as a “genomic event”), which affects the total expression of the gene or gene set and the biological behavior (“phenotype”) of descendent cells. Thus, in cancer cells, “gene activity” herein is determined not only by the multiple “layers” of epigenetic control systems and signals that call forth expression of the gene but by the number of times that gene appears in the genome. The term “epigenetic” herein refers to any process in an individual that, in operation, affects the expression of a gene or a set of genes in that individual, and stands in contrast to the “genetic” processes that govern the inheritance of genes in successive generations of cells or individuals.

Certain regions of chromosomes, depending upon the specific type of cancer, have proven to be hot spots for genomic gain inasmuch as increases in copy number in the genomes of cells from multiple donors tend to occur in one or a few specific regions of a specific chromosome. Such hot spots are referred to herein as sites of “recurrent genomic gain.” The term is to be distinguished from “recurrent cancer,” which refers to types of cancer that are likely to recur after an initial course of therapy, resulting in a “relapse.”

A number of terms herein relate to methods that enable the practitioner to examine many distinct genes at once. By these methods, sets of genes (“gene sets”) have been identified wherein each set has biologically relevant and distinctive properties as a set. Devices (which may be referred to herein as “platforms”) in which each gene in a significant part of an entire genome is isolated and arranged in an array of spots, each spot having its own “address,” enable one to detect, quantitatively, many thousands of the genes in a cell. More precisely, these “microarrays” typically detect expressed genes (an “expressed” gene is one that is actively transmitting its unique biochemical signal to the cell in which the gene resides). Microarray data, inasmuch as they display the expression of many genes at once, permit the practitioner to view “gene expression profiles” in a cell and to compare those profiles cell-to-cell to perform so-called “comparative analyses of expression profiles.” Such microarray-based “expression data” are capable of identifying genes that are “overexpressed” (or underexpressed) in, for example, a disease condition. An overexpressed gene may be referred to herein as having a high “expression score.”

The aforementioned methods for examining gene sets employ a number of well-known methods in molecular biology, to which references are made herein. A gene is a heritable chemical code resident in, for example, a cell, virus, or bacteriophage that an organism reads (decodes, decrypts, transcribes) as a template for ordering the structures of biomolecules that an organism synthesizes to impart regulated function to the organism. Chemically, a gene is a heteropolymer comprised of subunits (“nucleotides”) arranged in a specific sequence. In cells, such heteropolymers are deoxynucleic acids (“DNA”) or ribonucleic acids (“RNA”). DNA forms long strands. Characteristically, these strands occur in pairs. The first member of a pair is not identical in nucleotide sequence to the second strand, but complementary. The tendency of a first strand to bind in this way to a complementary second strand (the two strands are said to “anneal” or “hybridize”), together with the tendency of individual nucleotides to line up against a single strand in a complementarily ordered manner accounts for the replication of DNA.

Experimentally, nucleotide sequences selected for their complementarity can be made to anneal to a strand of DNA containing one or more genes. A single such sequence can be employed to identify the presence of a particular gene by attaching itself to the gene. This so-called “probe” sequence is adapted to carry with it a “marker” that the investigator can readily detect as evidence that the probe struck a target. As used herein, the term “marker” relates to any surrogate the artisan may use to “observe” an event or condition that is difficult or impossible to detect directly. In some contexts herein, the marker is said to “target” the condition or event. In other contexts, the condition or event is referred to as the target for the marker. Sequences used as probes may be quite small (e.g., “oligonucleotides” of <20 nucleotides) or quite large (e.g., a sequence of 100,000 nucleotides in DNA from a “bacterial artificial chromosome” or “BAC”). A BAC is a bacterial chromosome (or a portion thereof) with a “foreign” (typically, human) DNA fragment inserted in it. BACs are employed in a technique referred to herein as “fluorescence in situ hybridization” or “FISH.” A BAC or a portion of a BAC is constructed that has (1) a sequence complementary to a region of interest on a chromosome and (2) a marker whose presence is discernible by fluorescence. The chromosomes of a cell or a tissue are isolated (on a glass slide, for example) and treated with the BAC construct. Excess construct is washed away and the chromosomes examined microscopically to find chromosomes or, more particularly, identifiable regions of chromosomes that fluoresce.

Alternatively, such sequences can be delivered in pairs selected to hybridize with two specific sequences that bracket a gene sequence. A complementary strand of DNA then forms between the “primer pair.” In one well-known method, the “polymerase chain reaction” or “PCR,” the formation of complementary strands can be made to occur repeatedly in an exponential amplification. A specific nucleotide sequence so amplified is referred to herein as the “amplicon” of that sequence. “Quantitative PCR” or “qPCR” herein refers to a version of the method that allows the artisan not only to detect the presence of a specific nucleic acid sequence but also to quantify how many copies of the sequence are present in a sample, at least relative to a control. As used herein, “qRTPCR” may refer to “quantitative real-time PCR,” used interchangeably with “qPCR” as a technique for quantifying the amount of a specific DNA sequence in a sample. However, if the context so admits, the same abbreviation may refer to “quantitative reverse transcriptase PCR,” a method for determining the amount of messenger RNA present in a sample. Since the presence of a particular messenger RNA in a cell indicates that a specific gene is currently active (being expressed) in the cell, this quantitative technique finds use, for example, in gauging the level of expression of a gene.

Collectively, the genes of an organism constitute its genome. The term “genomic DNA” may refer herein to the entirety of an organism's DNA or to the entirety of the nucleotides comprising a single gene in an organism. A gene typically contains sequences of nucleotides devoted to coding (“exons”), and non-coding sequences that contribute in one way or another to the decoding process (“introns”).

The term “gene” refers to a nucleic acid (e.g., DNA) comprising covalently linked nucleotide monomers arranged in a particular sequence that comprises a coding sequence necessary for the production of a polypeptide or precursor or RNA (e.g., tRNA, siRNA, rRNA, etc.). The polypeptide can be encoded by a full-length coding sequence or by any portion of the coding sequence so long as the desired activities or functional properties (e.g., enzymatic activity, ligand binding, signal transduction, etc.) of the full-length or fragment are retained. The term also encompasses the coding region together with the sequences located adjacent to the coding region on both the 5′ and 3′ ends, such that the gene corresponds to the length of the full-length mRNA (also referred to as “pre-mRNA,” “nuclear RNA,” or “primary transcript RNA”) transcribed from it. The sequences that are located 5′ of the coding region and are present on the mRNA are referred to as 5′ untranslated sequences. The sequences that are located 3′ or downstream of the coding region and that are present on the mRNA are referred to as 3′ untranslated sequences. The term “gene” encompasses both cDNA (the coding region(s) only) and genomic forms of a gene. A genomic form or clone of a gene contains the coding region, which may be interrupted with non-coding sequences termed “introns” or “intervening regions” or “intervening sequences.” Introns are removed or “spliced out” from the nuclear or primary transcript, and are therefore absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.

Encoding in DNA (and messenger RNA) is accomplished by 3-membered nucleotide sequences called “codons.” Each codon encrypts an amino acid, and the sequence of codons encrypts the sequence of amino acids that identifies a particular protein. The code for a given gene is embedded in a (usually) much longer nucleotide sequence and is distinguishable to the cell's decoding system from the longer sequence by a “start codon” and a “stop” codon. The decoding system reads the sequence framed by these two codons (the so-called “open reading frame”). The readable code is transcribed into messenger RNA which itself comprises sites that ensure coherent translation of the code from nucleic acid to protein. In particular, the open reading frame is delimited by a so-called “translation initiation” codon and “translation termination” codon.

The term “plasmid” as used herein, refers to a small, independently replicating, piece of DNA. Similarly, the term “naked plasmid” refers to plasmid DNA devoid of extraneous material typically used to effect transfection. As used herein, a “naked plasmid” refers to a plasmid substantially free of calcium-phosphate, DEAE-dextran, liposomes, and/or polyamines. As used herein, the term “purified” refers to molecules (polynucleotides or polypeptides) that are removed from their natural environment, isolated or separated. “Purified” molecules are at least 50% free, preferably at least 75% free, and more preferably at least 90% free from other components with which they are naturally associated.

The term “recombinant DNA” refers to a DNA molecule that is comprised of segments of DNA joined together by means of molecular biology techniques. Similarly, the term “recombinant protein” refers to a protein molecule that is expressed from recombinant DNA.

The term “fusion protein” as used herein refers to a protein formed by expression of a hybrid gene made by combining two gene sequences. Typically this is accomplished by cloning a cDNA into an expression vector in frame (i.e., in an arrangement that the cell can transcribe as a single mRNA molecule) with an existing gene. The fusion partner may act as a reporter (e.g., βgal) or may provide a tool for isolation purposes (e.g., GST).

Where an amino acid sequence is recited herein to refer to an amino acid sequence of a protein molecule, “amino acid sequence” and like terms, such as “polypeptide” or “protein” are not meant to limit the amino acid sequence to the complete, native amino acid sequence associated with the recited protein molecule. Rather the terms “amino acid sequence” and “protein” encompass partial sequences, and modified sequences.

The term “wild type” refers to a gene or gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild type gene is the variant most frequently observed in a population and is thus arbitrarily designed the “normal” or “wild-type” form of the gene

In contrast, the terms “modified,” “mutant,” and “variant” (when the context so admits) refer to a gene or gene product that displays modifications in sequence and or functional properties (i.e., altered characteristics) when compared to the wild-type gene or gene product. In some embodiments, the modification comprises at least one nucleotide insertion, deletion, or substitution.

The term “homology” refers to a degree of complementarity. There may be partial homology or complete homology (i.e., identity). A partially complementary sequence is one that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid and is referred to using the functional term “substantially homologous.” The term “inhibition of binding,” when used in reference to nucleic acid binding, refers to reduction in binding caused by competition of homologous sequences for binding to a target sequence. The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe will compete for and inhibit the binding (i.e., the hybridization) of a completely homologous sequence to a target under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target that lacks even a partial degree of complementarity (e.g., less than about 30% identity); in the absence of non-specific binding the probe will not hybridize to the second non-complementary target. When used in reference to a single-stranded nucleic acid sequence, the term “substantially homologous” refers to any probe that can hybridize (i.e., it is the complement of) the single-stranded nucleic acid sequence under conditions of low stringency as described above.

As used herein, the term “competes for binding” when used in reference to a first and a second polypeptide means that the first polypeptide with an activity binds to the same substrate as does the second polypeptide with an activity. In one embodiment, the second polypeptide is a variant of the first polypeptide (e.g., encoded by a different allele) or a related (e.g., encoded by a homolog) or dissimilar (e.g., encoded by a second gene having no apparent relationship to the first gene) polypeptide. The efficiency (e.g., kinetics or thermodynamics) of binding by the first polypeptide may be the same as or greater than or less than the efficiency of substrate binding by the second polypeptide. For example, the equilibrium binding constant (K_(D)) for binding to the substrate may be different for the two polypeptides.

As used herein, the term “hybridization” refers to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementarity between the nucleic acids, stringency of the conditions involved, the T_(m) of the formed hybrid, and the G:C ratio within the nucleic acids

As used herein, the term “T_(m)” is used in reference to the “melting temperature.” The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. The equation for calculating the T_(m) of nucleic acids is well known in the art. As indicated by standard references, a simple estimate of the T_(m) value may be calculated by the equation: T_(m)=81.5+0.41(% G+C), when a nucleic acid is in aqueous solution at 1 M NaCl (See e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization [1985]). Other references include more sophisticated computations that take structural as well as sequence characteristics into account for the calculation of T_(m).

As used herein the term “stringency” is used in reference to the conditions of temperature, ionic strength, and the presence of other compounds such as organic solvents, under which nucleic acid hybridizations are conducted. Those skilled in the art will recognize that “stringency” conditions may be altered by varying the parameters just described either individually or in concert. With “high stringency” conditions, nucleic acid base pairing will occur only between nucleic acid fragments that have a high frequency of complementary base sequences (e.g., hybridization under “high stringency” conditions may occur between homologs with 85-100% identity, preferably 70-100% identity). With medium stringency conditions, nucleic acid base pairing will occur between nucleic acids with an intermediate frequency of complementary base sequences (e.g., hybridization under “medium stringency” conditions may occur between homologs with 50-70% identity). Thus, conditions of “weak” or “low” stringency are often required with nucleic acids that are derived from organisms that are genetically diverse, as the frequency of complementary sequences is usually less.

“High stringency conditions” when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 42° C. in a solution comprising 5× SSPE (43.8 g/l NaCl, 6.9 g/l NaH₂PO₄H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5× Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA followed by washing in a solution comprising 0.1× SSPE, 1.0% SDS at 42° C. when a probe of about 100 to about 1000 nucleotides in length is employed.

“Medium stringency conditions” when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 42° C. in a solution comprising 5× SSPE (43.8 g/l NaCl, 6.9 g/l NaH₂PO₄H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5× Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA followed by washing in a solution comprising 1.0× SSPE, 1.0% SDS at 42° C. when a probe of about 100 to about 1000 nucleotides in length is employed.

“Low stringency conditions” comprise conditions equivalent to binding or hybridization at 42° C. in a solution comprising 5× SSPE (43.8 g/l NaCl, 6.9 g/l NaH₂PO₄H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5× Denhardt's reagent [50× Denhardt's contains per 500 ml: 5 g Ficoll (Type 400, Pharamcia), 5 g BSA (Fraction V; Sigma)] and 100 g/ml denatured salmon sperm DNA followed by washing in a solution comprising 5× SSPE, 0.1% SDS at 42° C. when a probe of about 100 to about 1000 nucleotides in length is employed.

The term “equivalent” when made in reference to a hybridization condition as it relates to a hybridization condition of interest means that the hybridization condition and the hybridization condition of interest result in hybridization of nucleic acid sequences which have the same range of percent (%) homology. For example, if a hybridization condition of interest results in hybridization of a first nucleic acid sequence with other nucleic acid sequences that have from 85% to 95% homology to the first nucleic acid sequence, then another hybridization condition is said to be equivalent to the hybridization condition of interest if this other hybridization condition also results in hybridization of the first nucleic acid sequence with the other nucleic acid sequences that have from 85% to 95% homology to the first nucleic acid sequence.

The following terms are used to describe the sequence relationships between two or more polynucleotides: “reference sequence”, “sequence identity”, “percentage of sequence identity”, and “substantial identity”. A “reference sequence” is a defined sequence used as a basis for a sequence comparison; a reference sequence may be a subset of a larger sequence, for example, as a segment of a full-length cDNA sequence given in a sequence listing or may comprise a complete gene sequence. Generally, a reference sequence is at least 20 nucleotides in length, frequently at least 25 nucleotides in length, and often at least 50 nucleotides in length. Since two polynucleotides may each (1) comprise a sequence (i.e., a portion of the complete polynucleotide sequence) that is similar between the two polynucleotides, and (2) may further comprise a sequence that is divergent between the two polynucleotides, sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences of the two polynucleotides over a “comparison window” to identify and compare local regions of sequence similarity. A “comparison window”, as used herein, refers to a conceptual segment of at least 20 contiguous nucleotide positions wherein a polynucleotide sequence may be compared to a reference sequence of at least 20 contiguous nucleotides and wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by the local homology algorithm of Smith and Waterman (Smith and Waterman, Adv. Appl. Math., 2: 482, 1981) by the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch, J. Mol. Biol., 48:443, 1970), by the search for similarity method of Pearson and Lipman (Pearson and Lipman, Proc. Natl. Acad. Sci., U.S.A., 85:2444, 1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, Madison, Wis.), or by inspection, and the best alignment (i.e., resulting in the highest percentage of homology over the comparison window) generated by the various methods is selected. The term “sequence identity” means that two polynucleotide sequences are identical (i.e., on a nucleotide-by-nucleotide basis) over the window of comparison. The term “percentage of sequence identity” is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. The terms “substantial identity” as used herein denotes a characteristic of a polynucleotide sequence, wherein the polynucleotide comprises a sequence that has at least 85 percent sequence identity, preferably at least 90 to 95 percent sequence identity, more usually at least 99 percent sequence identity as compared to a reference sequence over a comparison window of at least 20 nucleotide positions, frequently over a window of at least 25-50 nucleotides, wherein the percentage of sequence identity is calculated by comparing the reference sequence to the polynucleotide sequence which may include deletions or additions which total 20 percent or less of the reference sequence over the window of comparison. The reference sequence may be a subset of a larger sequence, for example, as a segment of the full-length sequences of the compositions claimed in the present invention.

As applied to polypeptides, the term “substantial identity” means that two peptide sequences, when optimally aligned, such as by the programs GAP or BESTFIT using default gap weights, share at least 80 percent sequence identity, preferably at least 90 percent sequence identity, more preferably at least 95 percent sequence identity or more (e.g., 99 percent sequence identity). Preferably, residue positions which are not identical differ by conservative amino acid substitutions. Conservative amino acid substitutions refer to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having acidic side chains is glutamic acid and aspartic acid; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.

“Amplification” is used herein in two different ways. A given gene typically appears in a genome once, on one chromosome. Since chromosomes in somatic cells of eukaryotes are in general paired, two copies or alleles of each gene are found. In some conditions, such as cancer, replication of chromosome pairs during cell division is disturbed so that multiple copies of a gene or chromosome accrue over successive generations. The pheonomenon is referred to generally (and herein) as “amplification.”

In the context of molecular biological experimentation, the term is used differently. Experimentally, “amplification” is used in relation to a special case of nucleic acid replication involving template specificity. It is to be contrasted with non-specific template replication (i.e., replication that is template-dependent but not dependent on a specific template). Template specificity is here distinguished from fidelity of replication (i.e., synthesis of the proper polynucleotide sequence) and nucleotide (ribo- or deoxyribo-) specificity. Template specificity is frequently described in terms of “target” specificity. Target sequences are “targets” in the sense that they are sought to be sorted out from other nucleic acid. Amplification techniques have been designed primarily for this sorting out.

Template specificity is achieved in most amplification techniques by the choice of enzyme. Amplification enzymes are enzymes that, under the conditions in which they are used, will process only specific sequences of nucleic acids in a heterogeneous mixture of nucleic acids. In particular, Taq and Pfu polymerases, by virtue of their ability to function at high temperature, are found to display high specificity for the sequences bounded and thus defined by the primers; the high temperature results in thermodynamic conditions that favor primer hybridization with the target sequences and not hybridization with non-target sequences.

As used herein, the term “sample template” refers to nucleic acid originating from a sample that is analyzed for the presence of “target” (defined below). In contrast, “background template” is used in reference to nucleic acid other than sample template that may or may not be present in a sample. Background template is most often inadvertent. It may be the result of carryover, or it may be due to the presence of nucleic acid contaminants sought to be purified away from the sample. For example, nucleic acids from organisms other than those to be detected may be present as background in a test sample.

As used herein, the term “primer” refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, (i.e., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method

As used herein, the term “probe” refers to an oligonucleotide (i.e., a sequence of nucleotides), whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly or by PCR amplification, that is capable of hybridizing to another oligonucleotide of interest. A probe may be single-stranded or double-stranded. Probes are useful in the detection, identification and isolation of particular sequences. It is contemplated that any probe used in the present invention will be labelled with any “reporter molecule,” so that is detectable in any detection system, including, but not limited to enzyme (e.g., ELISA, as well as enzyme-based histochemical assays), fluorescent, radioactive, and luminescent systems. It is not intended that the present invention be limited to any particular detection system or label.

As used herein, the term “target,” when used in reference to the polymerase chain reaction, refers to the region of nucleic acid bounded by the primers used for polymerase chain reaction. Thus, the “target” is sought to be sorted out from other nucleic acid sequences. A “segment” is defined as a region of nucleic acid within the target sequence.

As used herein, the term “polymerase chain reaction” (“PCR”) refers to the method of Mullis (U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188, hereby incorporated by reference), that describe a method for increasing the concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded target sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one “cycle”; there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the “polymerase chain reaction” (hereinafter “PCR”). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be “PCR amplified.”

As used herein, the terms “restriction endonucleases” and “restriction enzymes” refer to bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide sequence.

The term “isolated” when used in relation to a nucleic acid, as in “an isolated oligonucleotide” or “isolated polynucleotide” refers to a nucleic acid sequence that is identified and separated from at least one contaminant nucleic acid with which it is ordinarily associated in its natural source. Isolated nucleic acid is present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids are nucleic acids such as DNA and RNA found in the state they exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs that encode a multitude of proteins. However, isolated nucleic acid encoding gene includes, by way of example, such nucleic acid in cells ordinarily expressing gene where the nucleic acid is in a chromosomal location different from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than that found in nature. The isolated nucleic acid, oligonucleotide, or polynucleotide may be present in single-stranded or double-stranded form. When an isolated nucleic acid, oligonucleotide or polynucleotide is to be utilized to express a protein, the oligonucleotide or polynucleotide will contain at a minimum the sense or coding strand (i.e., the oligonucleotide or polynucleotide may single-stranded), but may contain both the sense and anti-sense strands (i.e., the oligonucleotide or polynucleotide may be double-stranded).

The terms “fragment” and “portion” when used in reference to a nucleotide sequence (as in “a portion of a given nucleotide sequence”) refers to partial segments of that sequence. The fragments may range in size from four nucleotides to the entire nucleotide sequence minus one nucleotide (10 nucleotides, 20, 30, 40, 50, 100, 200, etc.).

Similarly, the terms “fragment” and “portion” when used in reference to a polypeptide sequence refers to partial segments of that sequence. In some embodiments, the portion has an amino-terminal and/or carboxy-terminal deletion as compared to the native protein, but where the remaining amino acid sequence is identical to the corresponding positions in the amino acid sequence deduced from a full-length cDNA sequence. Fragments are preferably at least 4 amino acids long, more preferably at least 50 amino acids long, and most preferably at least 50 amino acids long or longer (the entire amino acid sequence minus on amino acid). In particularly preferred embodiments, the portion comprises the amino acid residues required for intermolecular binding of the compositions of the present invention with its various ligands and/or substrates.

As used herein the term “portion” when in reference to a protein (as in “a portion of a given protein”) refers to fragments of that protein. The fragments may range in size from four consecutive amino acid residues to the entire amino acid sequence minus one amino acid

As used herein the term “coding region” when used in reference to structural gene refers to the nucleotide sequences that encode the amino acids found in the nascent polypeptide as a result of translation of a mRNA molecule. The coding region is bounded, in eukaryotes, on the 5′ side by the nucleotide triplet “ATG” that encodes the initiator methionine and on the 3′ side by one of the three triplets which specify stop codons (i.e., TAA, TAG, TGA

The term “recombinant DNA molecule” as used herein refers to a DNA molecule that is comprised of segments of DNA joined together by means of molecular biological techniques. Similarly, the term “recombinant protein” or “recombinant polypeptide” as used herein refers to a protein molecule that is expressed from a recombinant DNA molecule.

The term “native protein” as used herein to indicate that a protein does not contain amino acid residues encoded by vector sequences, that is the native protein contains only those amino acids found in the protein as it occurs in nature. A native protein may be produced by recombinant means or may be isolated from a naturally occurring source.

The term “Southern blot,” refers to the analysis of DNA on agarose or acrylamide gels to fractionate the DNA according to size followed by transfer of the DNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized DNA is then probed with a labeled probe to detect DNA species complementary to the probe used. The DNA may be cleaved with restriction enzymes prior to electrophoresis. Following electrophoresis, the DNA may be partially depurinated and denatured prior to or during transfer to the solid support. Southern blots are a standard tool of molecular biologists (Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, NY, pp 9.31-9.58, 1989).

The term “Northern blot,” as used herein refers to the analysis of RNA by electrophoresis of RNA on agarose gels to fractionate the RNA according to size followed by transfer of the RNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized RNA is then probed with a labeled probe to detect RNA species complementary to the probe used. Northern blots are a standard tool of molecular biologists (Sambrook, et al., supra, pp 7.39-7.52, 1989).

The term “Western blot” refers to the analysis of protein(s) (or polypeptides) immobilized onto a support such as nitrocellulose or a membrane. The proteins are run on acrylamide gels to separate the proteins, followed by transfer of the protein from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized proteins are then exposed to antibodies with reactivity against an antigen of interest. The binding of the antibodies may be detected by various methods, including the use of radiolabelled antibodies

As used herein, the term “transgenic” refers to a cell or organism whose genome has been heritably altered by genetically engineering into the genome a gene (“transgene”) not normally part of it or removing from it a gene ordinarily present (a “knockout” gene). The “transgene” or “foreign gene” may be placed into an organism by introducing it into newly fertilized eggs or early embryos. The term “foreign gene” refers to any nucleic acid (e.g., gene sequence) that is introduced into the genome of an animal by experimental manipulations and may include gene sequences found in that animal so long as the introduced gene does not reside in the same location as does the naturally-occurring gene.

As used herein, the term “vector” is used in reference to nucleic acid molecules that transfer DNA segment(s) from one cell to another. The term “vehicle” is sometimes used interchangeably with “vector.”

The term “expression vector” as used herein refers to a recombinant DNA molecule containing a desired coding sequence and appropriate nucleic acid sequences necessary for the expression of the operably linked coding sequence in a particular host organism. Nucleic acid sequences necessary for expression in prokaryotes usually include a promoter, an operator (optional), and a ribosome binding site, often along with other sequences. Eukaryotic cells are known to utilize promoters, enhancers, and termination and polyadenylation signals.

As used herein, the term host cell refers to any eukaryotic or prokaryotic cell (e.g., bacterial cells such as E. coli, yeast cells, mammalian cells, avian cells, amphibian cells, plant cells, fish cells, and insect cells), whether located in vitro or in vivo. For example, host cells may be located in a transgenic animal.

The term “transfection” as used herein refers to the introduction of foreign DNA into eukaryotic cells. Transfection may be accomplished by a variety of means known to the art including calcium phosphate-DNA co-precipitation, DEAE-dextran-mediated transfection, polybrene-mediated transfection, electroporation, microinjection, liposome fusion, lipofection, protoplast fusion, retroviral infection, and biolistics.

The term “stable transfection” or “stably transfected” refers to the introduction and integration of foreign DNA into the genome of the transfected cell. The term “stable transfectant” refers to a cell that has stably integrated foreign DNA into the genomic DNA.

The term “transient transfection” or “transiently transfected” refers to the introduction of foreign DNA into a cell where the foreign DNA fails to integrate into the genome of the transfected cell in the sense that the foreign DNA will be passed on to daughter cells. The term encompasses transfections of foreign DNA into the cytoplasm only. In general, however, the foreign DNA reaches the nucleus of the transfected cell and persists there for several days. During this time the foreign DNA is subject to the regulatory controls that govern the expression of endogenous genes in the chromosomes. The term “transient transfectant” refers to cells that have taken up foreign DNA but have failed to integrate this DNA. The term “transient transfection” encompasses transfection of foreign DNA into the cytoplasm only

The term “calcium phosphate co-precipitation” refers to a technique for the introduction of nucleic acids into a cell. The uptake of nucleic acids by cells is enhanced when the nucleic acid is presented as a calcium phosphate-nucleic acid co-precipitate. The original technique of is modified to optimize conditions for particular types of cells. The art is well aware of these numerous modifications.

A “composition comprising a given polynucleotide sequence” as used herein refers broadly to any composition containing the given polynucleotide sequence. Such compositions may be employed as hybridization probes, typically in an aqueous solution containing salts (e.g., NaCl), detergents (e.g., SDS), and other components (e.g., Denhardt's solution, dry milk, salmon sperm DNA, etc.).

The terms “N-terminus” “NH₂-terminus” and “amino-terminus” refer to the amino acid residue corresponding to the methionine encoded by the start codon (e.g., position or residue 1). In contrast the terms “C-terminus” “COOH-terminus” and “carboxy terminus” refer to the amino acid residue encoded by the final codon (e.g., last or final residue prior to the stop codon).

The term “conservative substitution” as used herein refers to a change that takes place within a family of amino acids that are related in their side chains. Genetically encoded amino acids can be divided into four families: (1) acidic (aspartate, glutamate); (2) basic (lysine, arginine, histidine); (3) nonpolar (alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan); and (4) uncharged polar (glycine, asparagine, glutamine, cysteine, serine, threonine, tyrosine). Phenylalanine, tryptophan, and tyrosine are sometimes classified jointly as aromatic amino acids. In similar fashion, the amino acid repertoire can be grouped as (1) acidic (aspartate, glutamate); (2) basic (lysine, arginine, histidine), (3) aliphatic (glycine, alanine, valine, leucine, isoleucine, serine, threonine), with serine and threonine optionally be grouped separately as aliphatic-hydroxyl; (4) aromatic (phenylalanine, tyrosine, tryptophan); (5) amide (asparagine, glutamine); and (6) sulfur-containing (cysteine and methionine). Whether a change in the amino acid sequence of a peptide results in a functional homolog can be readily determined by assessing the ability of the variant peptide to function in a fashion similar to the wild-type protein. Peptides having more than one replacement can readily be tested in the same manner. In contrast, the term “nonconservative substitution” refers to a change in which an amino acid from one family is replaced with an amino acid from another family (e.g., replacement of a glycine with a tryptophan). Guidance in determining which amino acid residues can be substituted, inserted, or deleted without abolishing biological activity can be found using computer programs (e.g., LASERGENE software, DNASTAR Inc., Madison, Wis.

A peptide sequence and nucleotide sequence may be “endogenous” or “heterologous” (i.e., “foreign”). The term “endogenous” refers to a sequence which is naturally found in the cell or virus into which it is introduced so long as it does not contain some modification relative to the naturally-occurring sequence. The term “heterologous” refers to a sequence which is not endogenous to the cell or virus into which it is introduced. For example, heterologous DNA includes a nucleotide sequence which is ligated to, or is manipulated to become ligated to, a nucleic acid sequence to which it is not ligated in nature, or to which it is ligated at a different location in nature. Heterologous DNA also includes a nucleotide sequence which is naturally found in the cell or virus into which it is introduced and which contains some modification relative to the naturally-occurring sequence. Generally, although not necessarily, heterologous DNA encodes heterologous RNA and heterologous proteins that are not normally produced by the cell or virus into which it is introduced. Examples of heterologous DNA include reporter genes, transcriptional and translational regulatory sequences, DNA sequences which encode selectable marker proteins (e.g., proteins which confer drug resistance), etc. In preferred embodiments, the terms “heterologous antigen” and “heterologous sequence” refer to a non-hepadna virus antigen or amino acid sequence including but not limited to microbial antigens, mammalian antigens and allergen antigens.

The terms “peptide,” “peptide sequence,” “amino acid sequence,” “polypeptide,” and “polypeptide sequence” are used interchangeably herein to refer to at least two amino acids or amino acid analogs which are covalently linked by a peptide bond or an analog of a peptide bond. The term peptide includes oligomers and polymers of amino acids or amino acid analogs. The term peptide also includes molecules which are commonly referred to as peptides, which generally contain from about two (2) to about twenty (20) amino acids. The term peptide also includes molecules which are commonly referred to as polypeptides, which generally contain from about twenty (20) to about fifty amino acids (50). The term peptide also includes molecules which are commonly referred to as proteins, which generally contain from about fifty (50) to about three thousand (3000) amino acids. The amino acids of the peptide may be L-amino acids or D-amino acids. A peptide, polypeptide or protein may be synthetic, recombinant or naturally occurring. A synthetic peptide is a peptide which is produced by artificial means in vitro

The terms “oligosaccharide” and “OS” antigen refer to a carbohydrate comprising up to ten component sugars, either O or N linked to the next sugar. Likewise, the terms “polysaccharide” and “PS” antigen refer to polymers of more than ten monosaccharide residues linked glycosidically in branched or unbranched chains

As used herein, the term “mammalian sequence” refers to synthetic, recombiant or purified sequences (preferably sequence fragments comprising at least one B cell epitope) of a mammal. Exemplary mammalian sequences include cytokine sequence, MHC class I heavy chain sequences, MHC class II alpha and beta chain sequences, and amyloid β-peptide sequences.

The terms “mammals” and “mammalian” refer animals of the class mammalia which nourish their young by fluid secreted from mammary glands of the mother, including human beings. The class “mammalian” includes placental animals, marsupial animals, and monotrematal animals. An exemplary “mammal” may be a rodent, primate (including simian and human) ovine, bovine, ruminant, lagomorph, porcine, caprine, equine, canine, feline, ave, etc. Preferred non-human animals are selected from the order Rodentia.

Preferred embodiments of the present invention are primarily directed to vertebrate (backbone or notochord) members of the animal kingdom.

The terms “patient” and “subject” refer to a mammal that may be treated using the methods of the present invention.

The term “control” refers to subjects or samples which provide a basis for comparison for experimental subjects or samples. For instance, the use of control subjects or samples permits determinations to be made regarding the efficacy of experimental procedures. In some embodiments, the term “control subject” refers to a subject that which receives a mock treatment (e.g., saline alone).

The terms “diluent” and “diluting agent” as used herein refer to agents used to diminish the strength of an admixture. Exemplary diluents include water, physiological saline solution, human serum albumin, oils, polyethylene glycols, glycerine, propylene glycol or other synthetic solvents, antibacterial agents such as benzyl alcohol, antioxidants such as ascorbic acid or sodium bisulphite, chelating agents such as ethylene diamine-tetra-acetic acid, buffers such as acetates, citrates or phosphates and agents for adjusting the osmolarity, such as sodium chloride or dextrose.

The terms “carrier” and “vehicle” as used herein refer to usually inactive accessory substances into which a pharmaceutical substance is suspended. Exemplary carriers include liquid carriers (such as water, saline, culture medium, saline, aqueous dextrose, and glycols) and solid carriers (such as carbohydrates exemplified by starch, glucose, lactose, sucrose, and dextrans, anti-oxidants exemplified by ascorbic acid and glutathione, and hydrolyzed proteins.

The term “derived” when in reference to a peptide derived from a source (such as a microbe, cell, etc.) as used herein is intended to refer to a peptide which has been obtained (e.g., isolated, purified, etc.) from the source. Alternatively, or in addition, the peptide may be genetically engineered and/or chemically synthesized.

The terms “operably linked,” “in operable combination,” and “in operable order” as used herein refer to the linkage of nucleic acid sequences such that they perform their intended function. For example, operably linking a promoter sequence to a nucleotide sequence of interest refers to linking the promoter sequence and the nucleotide sequence of interest in a manner such that the promoter sequence is capable of directing the transcription of the nucleotide sequence of interest and/or the synthesis of a polypeptide encoded by the nucleotide sequence of interest. Similarly, operably linking a nucleic acid sequence encoding a protein of interest means linking the nucleic acid sequence to regulatory and other sequences in a manner such that the protein of interest is expressed. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.

The terms “C-terminal portion,” “COOH-terminal portion,” “carboxy terminal portion,” “C-terminal domain,” “COOH-terminal domain,” and “carboxy terminal domain,” when used in reference to an amino acid sequence of interest refer to the amino acid sequence (and portions thereof that is located from approximately the middle of the amino acid sequence of interest to the C-terminal-most amino acid residue of the sequence of interest. The terms “specific binding,” “binding specificity,” and grammatical equivalents thereof when made in reference to the binding of a first molecule (such as a polypeptide, glycoprotein, nucleic acid sequence, etc.) to a second molecule (such as a polypeptide, glycoprotein, nucleic acid sequence, etc.) refer to the preferential interaction between the first molecule with the second molecule as compared to the interaction between the second molecule with a third molecule. Specific binding is a relative term that does not require absolute specificity of binding; in other words, the term “specific binding” does not require that the second molecule interact with the first molecule in the absence of an interaction between the second molecule and the third molecule. Rather, it is sufficient that the level of interaction between the first molecule and the second molecule is higher than the level of interaction between the second molecule with the third molecule. “Specific binding” of a first molecule with a second molecule also means that the interaction between the first molecule and the second molecule is dependent upon the presence of a particular structure on or within the first molecule; in other words the second molecule is recognizing and binding to a specific structure on or within the first molecule rather than to nucleic acids or to molecules in general. For example, if a second molecule is specific for structure “A” that is on or within a first molecule, the presence of a third nucleic acid sequence containing structure A will reduce the amount of the second molecule which is bound to the first molecule.

For example, the term “has the biological activity of a specifically named protein” when made in reference to the biological activity of a variant of the specifically named protein refers, for example, to a quantity of binding of an antibody that is specific for the specifically named protein to the variant which is preferably greater than 50% (preferably from 50% to 500%, more preferably from 50% to 200%, most preferably from 50% to 100%), as compared to the quantity of binding of the same antibody to the specifically named protein.

Reference herein to any specifically named nucleotide sequence includes within its scope fragments, homologs, and sequences that hybridize under stringent condition to the specifically named nucleotide sequence. The term “homolog” of a specifically named nucleotide sequence refers to an oligonucleotide sequence which exhibits greater than or equal to 50% identity to the sequence of interest. Alternatively, or in addition, a homolog of any specifically named nucleotide sequence is defined as an oligonucleotide sequence which has at least 95% identity with the sequence of the nucleotide sequence in issue. In another embodiment, the sequence of the homolog has at least 90% identity, and preferably at least 85% identity with the sequence of the nucleotide sequence in issue.

Exons, introns, genes and entire gene-sets are characteristically locatable with respect to one another. That is, they have generally invariant “genomic loci” or “genomic positions.” Genes distributed across one or several chromosomes can be mapped to specific locations on specific chromosomes. The field of “cytogenetics” addresses several aspects of gene mapping. First, optical microscopy reveals features of chromosomes that are useful as addresses for genes. In humans, chromosomes are morphologically distinguishable from one another and each (except for the Y-chromosome) has two distinct arms separated by a “centromere.” Each arm has distinctive “bands” occupied by specific genes. Disease-related changes in chromosome number, and changes in banding form the basis for diagnosing a number of diseases. “Microdissection” of chromosomes and DNA analysis of the microdissected fragments have connected specific DNA sequences to specific locations on chromosomes. In cancer, a region of a chromosome may duplicate or amplify itself or drop out entirely. FISH, mentioned above, and “comparative genomic hybridization” (“CGH”) have extended the reach of cytogenetic analysis to the extent of measuring genome alterations within and between individuals. CGH, for example, in which chromosomes from a normal cell are hybridized with a corresponding preparation from a cancer cell provides a means of directly determining cancer-related differences in copy number of chromosomal regions.

“Targeted therapeutics” is used herein to denote any therapeutic modality that affects only or primarily only the cells or tissues selected (“targeted”) for treatment. A monoclonal antibody specific for an antigen expressed only by a target (if retained by the target) is highly useful in targeted therapeutics. In the case of unwanted cells such as cancer cells, if the antibody doesn't induce destruction of the target directly, it may do so indirectly by carrying to the target, for example, a agent coupled to the antibody. On the other hand, agents that suppress processes that tend to promote uncontrolled proliferation of cells (“antineoplastic agents”) can be delivered to target sites in this manner.

The term “agent” is used herein in its broadest sense to refer to a composition of matter, a process or procedure, a device or apparatus employed to exert a particular effect. By way of non-limiting example, a surgical instrument may be employed by a practitioner as an “excising” agent to remove tissue from a subject; a chemical may be used as a pharmaceutical agent to remove, damage or neutralize the function of a tissue, etc. Such pharmaceutical agents are said to be “anticellular.” Cells may be removed by an agent that promotes apoptosis. A variety of toxic agents, including other cells (e.g., cytotoxic T-cell lymphocytes) and their secretions, and a plethora of chemical species, can damage cells.

The term “by-stander”, as used herein, refers to a process or event initiated or affected by another, causative event or process

The term “knockdown”, as used herein, refers to a method of selectively preventing the expression of a gene in an individual.

The term “oncogene”, as used herein, refers to any gene that regulates a process affecting the suppression of abnormal proliferative events.

The term “single nucleotide polymorphism” or “SNP”, as used herein, refers to a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species or between paired chromosomes in an individual. Single nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions between genes. Single nucleotide polymorphisms within a coding sequence will not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code. A Single nucleotide polymorphism in which both forms lead to the same polypeptide sequence is termed synonymous (sometimes called a silent mutation)—if a different polypeptide sequence is produced they are non-synonymous. Single nucleotide polymorphisms that are not in protein-coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of non-coding RNA.

The term “tissue array” or “tissue microarray”, as used herein, refers to high throughput platforms for the rapid analysis of protein, RNA, or DNA molecules. These arrays can be used to validate the clinical relevance of potential biological targets in the development of diagnostics, therapeutics and to study new disease markers and genes. Tissue arrays are suitable for genomics-based diagnostic and drug target discovery.

As used herein, the term “shRNA” or “short hairpin RNA” refers to a sequence of ribonucleotides comprising a single-stranded RNA polymer that makes a tight hairpin turn on itself to provide a “double-stranded” or duplexed region. shRNA can be used to silence gene expression via RNA interference. shRNA hairpin is cleaved into short interfering RNAs (siRNA) by the cellular machinery and then bound to the RNA-induced silencing complex (RISC). It is believed that the complex inhibits RNA as a consequence of the complexed siRNA hybridizing to and cleaving RNAs that match the siRNA that is bound thereto.

As used herein, the term “RNA interference” or “RNAi” refers to the silencing or decreasing of gene expression by siRNAs. It is the process of sequence-specific, post-transcriptional gene silencing in animals and plants, initiated by siRNA that is homologous in its duplex region to the sequence of the silenced gene. The gene may be endogenous or exogenous to the organism, present integrated into a chromosome or present in a transfection vector that is not integrated into the genome. The expression of the gene is either completely or partially inhibited. RNAi inhibits the gene by compromising the function of a target RNA, completely or partially. Both plants and animals mediate RNAi by the RNA-induced silencing complex (RISC); a sequence-specific, multicomponent nuclease that destroys messenger RNAs homologous to the silencing trigger. RISC is known to contain short RNAs (approximately 22 nucleotides) derived from the double-stranded RNA trigger, although the protein components of this activity are unknown. However, the 22-nucleotide RNA sequences are homologous to the target gene that is being suppressed. Thus, the 22-nucleotide sequences appear to serve as guide sequences to instruct a multicomponent nuclease, RISC, to destroy the specific mRNAs. Carthew has reported (Curr. Opin. Cell Biol. 13(2): 244-248 (2001)) that eukaryotes silence gene expression in the presence of dsRNA homologous to the silenced gene. Biochemical reactions that recapitulate this phenomenon generate RNA fragments of 21 to 23 nucleotides from the double-stranded RNA. These stably associate with an RNA endonuclease, and probably serve as a discriminator to select mRNAs. Once selected, mRNAs are cleaved at sites 21 to 23 nucleotides apart.

As used herein, the term “siRNAs” refers to short interfering RNAs. In some embodiments, siRNAs comprise a duplex, or double-stranded region, of about 18-25 nucleotides long; often siRNAs contain from about two to four unpaired nucleotides at the 3′ end of each strand. At least one strand of the duplex or double-stranded region of a siRNA is substantially homologous to or substantially complementary to a target RNA molecule. The strand complementary to a target RNA molecule is the “antisense strand”; the strand homologous to the target RNA molecule is the “sense strand”, and is also complementary to the siRNA antisense strand. siRNAs may also contain additional sequences; non-limiting examples of such sequences include linking sequences, or loops, as well as stem and other folded structures. siRNAs appear to function as key intermediaries in triggering RNA interference in invertebrates and in vertebrates, and in triggering sequence-specific RNA degradation during posttranscriptional gene silencing in plants.

The term “xenograft”, as used herein, refers to the transfer or transplant of a cell(s) or tissue from one species to an unlike species (or genus or family).

The term “orthotopic” or “orthotopic xenograft”, as used herein, refers to a cell or tissue transplant grafted into its normal place in the body.

The term “fluorescent activated cell sorting” or “FACS”, as used herein, refers to a technique for counting, examining, and sorting microscopic particles suspended in a stream of fluid. It allows simultaneous multiparametric analysis of the physical and/or chemical characteristics of single cells flowing through an optical and/or electronic detection apparatus. Generally, a beam of light (usually laser light) of a single wavelength is directed onto a hydro-dynamically focused stream of fluid. A number of detectors are aimed at the point where the stream passes through the light beam; one in line with the light beam (Forward Scatter, correlates to cell volume) and several perpendicular to the beam, (Side Scatter, correlates to the inner complexity of the particle and/or surface roughness) and one or more fluorescent detectors. Each suspended particle passing through the beam scatters the light in some way, and fluorescent chemicals found in the particle or attached to the particle may be excited into emitting light at a lower frequency than the light source. By analyzing the combinations of scattered and fluorescent light picked up by the detectors it is then possible to derive information about the physical and chemical structure of each individual particle.

The term “data mining”, as used herein, refers to the automated or convenient extraction of patterns representing knowledge implicitly stored or captured in large databases, data warehouses, internet websites, other massive information repositories, or data streams.

The terms “overexpress”, “overexpressing” and grammatical equivalents, as used herein, refer to the production of a gene product at levels that exceed production in normal or control cells. The term “overexpression” or “highly expressed” may be specifically used in reference to levels of mRNA to indicate a higher level of expression than that typically observed in a given tissue in a control or non-transgenic animal. Levels of mRNA are measured using any of a number of techniques known to those skilled in the art including, but not limited to Northern blot analysis. Appropriate controls are included on the Northern blot to control for differences in the amount of RNA loaded from each tissue analyzed, the amount of 28S rRNA (an abundant RNA transcript present at essentially the same amount in all tissues) present in each sample can be used as a means of normalizing or standardizing the mRNA-specific signal observed on Northern blots. Overexpression may likewise result in elevated levels of proteins encoded by said mRNAs.

The term “heatmap”, as used herein, refers to a graphical representation of data where the values obtained from a variable two-dimensional map are represented as colors. As related to the field of molecular biology, heat maps typically represent the level of expression of multiple genes across a number of comparable samples as obtained from a microarray.

The term “phage display”, as used herein, refers to the integration/ligation of numerous genetic sequences from a DNA library, consisting of all coding sequences of a cell, tissue or organism library into the genome of a bacteriophage (i.e. phage) for high-throughput screening protein-protein and/or protein-DNA interactions. Using a multiple cloning site, these fragments are inserted in all three possible reading frames to ensure that the cDNA is translated. DNA fragments are then expressed on the surface of the phage particle as part of it coat protein. The phage gene and insert DNA hybrid is then amplified by transforming bacterial cells (such as TG1 E. coli cells), to produce progeny phages that display the relevant protein fragment as part of their outer coat. By immobilizing relevant DNA or protein target(s) to the surface of a well, a phage that displays a protein that binds to one of those targets on its surface will remain while others are removed by washing. Those that remain can be eluted, used to produce more phage (by bacterial infection with helper phage) and so produce an enriched phage mixture. Phage eluted in the final step can be used to infect a suitable bacterial host, from which the phagemids can be collected and the relevant DNA sequence excised and sequenced to identify the relevant, interacting proteins or protein fragments.

The term “apoptosis”, as used herein, refers to a form of programmed cell death in multicellular organisms that involves a series of biochemical events that lead to a variety of morphological changes, including blebbing, changes to the cell membrane such as loss of membrane asymmetry and attachment, cell shrinkage, nuclear fragmentation, chromatin condensation, and chromosomal DNA fragmentation. Defective apoptotic processes have been implicated in an extensive variety of diseases; for example, defects in the apoptotic pathway have been implicated in diseases associated with uncontrolled cell proliferations, such as cancer.

The term “bioluminescence imaging” or “BLI”, as used herein, refers to the noninvasive study of ongoing biological processes in living organisms (for example laboratory animals) using bioluminescence, the process of light emission in living organisms. Bioluminescence imaging utilizes native light emission from one of several organisms which bioluminescence. The three main sources are the North American firefly, the sea pansy (and related marine organisms), and bacteria like Photorhabdus luminescens and Vibrio fischeri. The DNA encoding the luminescent protein is incorporated into the laboratory animal either via a virus or by creating a transgenic animal. While the total amount of light emitted via bioluminescence is typically small and not detected by the human eye, an ultra-sensitive CCD camera can image bioluminescence from an external vantage point. Common applications of BLI include in vivo studies of infection (with bioluminescent pathogens), cancer progression (using a bioluminescent cancer cell line), and reconstitution kinetics (using bioluminescent stem cells).

The term “consensus region” or “consensus sequence”, as used herein, refers to the conserved sequence motifs that show which nucleotide residues are conserved and which nucleotide residues are variable when comparing multiple DNA, RNA, or amino acid sequence alignments. When comparing the results of a multiple sequence alignment, where related sequences are compared to each other, and similar functional sequence motifs are found. The consensus sequence shows which residues are conserved (are always the same), and which residues are variable. A consensus sequence may be a short sequence of nucleotides, which is found several times in the genome and is thought to play the same role in its different locations. For example, many transcription factors recognize particular consensus sequences in the promoters of the genes they regulate. In the same way restriction enzymes usually have palindromic consensus sequences, usually corresponding to the site where they cut the DNA. Splice sites (sequences immediately surrounding the exon-intron boundaries) can also be considered as consensus sequences. In one aspect, a consensus sequence defines a putative DNA recognition site, obtained for example, by aligning all known examples of a certain recognition site and defined as the idealized sequence that represents the predominant base at each position. Related sites should not differ from the consensus sequence by more than a few substitutions.

The term “linkage”, or “genetic linkage,” as used herein, refers to the phenomenon that particular genetic loci of genes are inherited jointly. The “linkage strength” refers to the probability of two genetic loci being inherited jointly. As the distance between genetic loci increases, the loci are more likely to be separated during inheritance, and thus linkage strength is weaker.

The term “neighborhood score”, as used herein, refers to the relative value assigned to a genomic locus based on a geometry-weighted sum of expression scores of all the genes on a given chromosome, as a measurement of the copy number status of the locus. A positive neighborhood score is indicative of an increase in copy number, whereas a negative neighborhood score is indicative of a decrease in copy number.

The term “expression score”, as used herein, refers to the expression differences (i.e., the level of transcription (RNA) or translation (protein)) between comparison groups on a given chromosome. The expression score for a given gene is calculated by correlating the level of expression of said gene with a phenotype in comparison. For example, an expression score may represent a comparison of the expression differences of a given gene in normal vs. abnormal conditions, such as parental vs. drug-resistant cell lines. As used herein, the term “regional expression score” refers to the expression score of gene(s) in proximity to the locus in consideration. Since linkage strength between genetic loci decreases (i.e. decays) as the distance between them increases, the “regional expression score” more accurately reflects the expression differences between comparison groups by assigning greater weight to the expression scores of genes in proximity to the locus in consideration.

The terms “geometry-weighted” or “geometry-weighted sum”, as used herein, refers to the significance attached to a given value, for example an “expression score”, based on physical position, including but not limited to genomic position. Since linkage strength between genetic loci decreases (i.e. decays) as the distance between them increases, the “weight” assigned to a given value is adjusted accordingly.

The term “copy number alteration” or “CNA”, as used herein, refers to the increase (i.e. genomic gain) or decrease (i.e. genomic loss) in the number of copies of a gene at a specific locus of a chromosome as compared to the “normal” or “standard” number of copies of said gene that locus. As used herein, an increase in the number of copies of a given gene at a specific locus may also be referred to as an “amplification” or “genomic amplification” and should not be confused with the use of the term “amplification” as it relates, for example, to amplification of DNA or RNA in PCR and other experimental techniques.

The term “clonogenic assay”, as used herein, refers to a technique for studying whether a given cancer therapy (for example drugs or radiation) can reduce the clonogenic survival and proliferation of tumor cells. While any type of cell may be used, human tumor cells are commonly used for oncological research. The term “clonogenic” refers to the fact that these cells are clones of one another.

The term “adjuvant therapy”, as used herein, refers to additional treatment given after the primary treatment to increase the chances of a cure. In some instances, adjuvant therapy is administered after surgery where all detectable disease has been removed, but where there remains a statistical risk of relapse due to occult disease. If known disease is left behind following surgery, then further treatment is not technically “adjuvant”. Adjuvant therapy may include chemotherapy, radiation therapy, hormone therapy, or biological therapy. For example, radiotherapy or chemotherapy is commonly given as adjuvant treatment after surgery for a breast cancer. Oncologists use statistical evidence to assess the risk of disease relapse before deciding on the specific adjuvant therapy. The aim of adjuvant treatment is to improve disease-specific and overall survival. Because the treatment is essentially for a risk, rather than for provable disease, it is accepted that a proportion of patients who receive adjuvant therapy will already have been cured by their primary surgery. Adjuvant chemotherapy and radiotherapy are often given following surgery for many types of cancer, including colon cancer, lung cancer, pancreatic cancer, breast cancer, prostate cancer, and some gynecological cancers.

The term “matched samples”, as used herein, as for example “matched cancer samples” refers to a sample in which individual members of the sample are matched with every other sample by reference to a particular variable or quality other than the variable or quality immediately under investigation. Comparison of dissimilar groups based on specified characteristics is intended to reduce bias and the possible effects of other variables. Matching may be on an individual (matched pairs) or a group-wide basis.

The term “genomic segments”, as used herein, refers to any defined part or region of a chromosome, and may contain zero, one or more genes.

The term “co-administer”, as used herein, refers to the administration of two or more agents, drugs, and/or compounds together (i.e. at the same time).

The term “diagnose” or “diagnosis”, as used herein, refers to the determination, recognition, or identification of the nature, cause, or manifestation of a condition based on signs, symptoms, and/or laboratory findings.

DETAILED DESCRIPTION

Receptor protein tyrosine kinases (RPTKs) transmit extracellular signals across the plasma membrane to cytosolic proteins, stimulating the formation of complexes that regulate key cellular functions. Over half of the 90 tyrosine kinases have been implicated in human cancers and are for this reason considered highly promising drug targets. To gain insight into the tyrosine kinases that contribute to breast cancer related cellular mechanisms, we carried out a large-scale loss-of-function analysis of the tyrosine kinases, using RNA interference, in the clinically relevant Erb-B2 positive, BT474 breast cancer cell line. The cytosolic, non-receptor tyrosine kinase Bruton's tyrosine kinase (BTK), which has been extensively studied for its role in B cell development, was among those tyrosine kinase genes required for BT474 breast cancer cell survival. The BTK protein identified was an alternative form containing an amino-terminal extension. This alternative form of the Btk message is also present in tumorigenic breast cells at significantly higher levels than in normal breast cells.

In the large-scale RNAi screen we found that nearly ⅓ of the human TKs screened impeded cellular proliferation by more than half of control levels in a ErbB-2 over-expressing breast cancer cell line. Unexpectedly, four of the five non-receptor tyrosine kinases that exhibited the strongest impact on cellular proliferation were members of the Tec family of cytoplasmic tyrosine kinases. Further exploration of one Tec family member, Bruton's Tyrosine Kinase, (BTK) revealed that its knockdown using either siRNAs led to an increase in apoptosis. A unique Btk transcript was isolated from BT474 cells, which encodes an additional 34 amino acids in frame with the published BTK start codon, suggesting that an N-terminally elongated form of the BTK protein is present in BT474 breast cancer cells. The expression of this novel Btk transcript is higher in a number of breast cancer cell lines compared to non-tumorigenic breast cell lines. These results suggest that an alternative BTK protein, potentially with other Tec family tyrosine kinases, contribute to breast cancer cell survival.

RNAi knockdown screen of the PTKs in BT474 breast cancer cells. An unbiased functional RNAi screen targeting the PTKs in the clinically relevant ErbB-2 positive, BT474 breast cancer cell line was performed to identify additional TKs that when knocked down, sensitized the cells to cell death. 236 short-hairpin RNAs (shRNAs) (Paddison et al., 2004) were used to target 82 of the 90 PTK genes in the ErbB-2-positive breast cancer cell line and effects on cells were monitored using alamarBlue, a fluorimetric indicator of both cell proliferation and viability (Kourtidis et al., 2007), (FIG. 1). 25 of the 82 genes that were inhibited using shRNAs led to a fifty percent or greater decrease in BT474 cellular proliferation compared to control levels, in three replicate experiments using at least two unique shRNAs per gene. EGFR, ERBB2, ABL2, FES, NTRK2 (TRK-B), PTK2B, FGFR2, LYN (V-yes-1), EphA1, and BTK were among the kinases that when knocked down caused the greatest reduction in BT474 cellular proliferation levels (FIG. 1 bottom panel). The validity of the screen is supported in that many of these PTKs have previously described roles in breast tumors (EGFR, ERBB2, FGFR2, PTK2B, NTRK2/TRK-B, EphA1, ABL2) (Brantley-Sieders et al., 2005; Ogawa et al., 2000). For instance, overexpresssion of ERBB2 in mammary epithelial cells causes malignant transformation and amplification of ERBB2 in invasive primary breast cancers correlates with reduced patient survival (Baselga_(—)2006_Science). FGFR2 is amplified or overexpressed in 5-10% of breast tumors (Adnane et al., 1991; Cha et al., 2008; Penault-Llorca et al., 1995), and has been the focus of several genome-wide association studies covering thousands of unique breast tumors (Hunter et al., 2007). NTRK2/TRKB is the brain-derived neurotrophic factor (BDNF) receptor that is expressed in a subset of high-grade human breast tumors (Cameron and Foster, 2008).

Btk silencing leads to increased apoptosis. Bruton's Tyrosine Kinase (BTK) was among those genes whose knockdown caused the most significant reduction in BT474 cellular proliferation (FIG. 1). Mutations in the human Btk gene cause inherited X-linked agammaglobulinemia which is characterized by a virtual absence of B lymphocytes. This is due to a block between the pro- and pre-B cell stages of B cell maturation (Tsukada et al., 1993); (Lindvall et al., 2005). Btk over-expression has been implicated in imatinib resistance to chronic myeloid leukemia (CML) and acute lymphoblastic leukemia (ALL) (Villuendas et al., 2006); (Hofmann et al., 2002) and its constitutive activation due to deregulated B cell receptor (BCR) engagement is an integral component to certain B cell lymphomas (Irish et al., 2006); (Kuppers, 2005). In B-lineage lymphoid cells, Btk serves a protective role through inhibition of Fas/APO-1 mediated apoptosis (Qiu and Kung, 2000; Vassilev et al., 1999). We chose to interrogate the role Btk might have in breast cancer cells further since no function had been previously described for it in either normal or malignant breast cellular processes. Quantitative PCR (qPCR) analysis detected the Btk transcript towards the later rounds of cycling (34th of 40 total cycles) indicating that Btk levels in BT474 cells are relatively low. Nevertheless, the Btk transcript could be specifically knocked down 2.2 fold further using siRNAs (data not shown). The initial shRNA screen, utilizing a redox indicator assay, revealed a severe decrease in cellular proliferation after knockdown of Btk in BT474 cells. To determine if reduced proliferation correlated with an increase in apoptosis BT474 cells were transfected with siRNAs targeting Btk and cleaved caspase 3 levels were compared to control cells. BT474 cells that were transfected with the Btk siRNA had an 11 fold increase in apoptotic cells compared to control (FIG. 2 a -2 c) indicating that the loss of proliferation in Btk silenced BT474 cells is due, at least in part, to increased apoptosis.

A novel form of the Btk message is present in breast cancer cells. Although no function has been previously described for Btk in breast cells, inhibiting Btk using shRNAs severely reduced the proliferation of the BT474 breast cancer cell line compared to control (FIG. 1). The presence of the Btk transcript in BT474 breast cancer cells was confirmed by amplifying the transcript from cDNA. Two distinct Btk specific primer sets were generated by designing primer pairs to unique regions of the Btk gene at nucleotide positions 142 and 2,522, (Btk 5′UTR) and 519 and 2,079, (Btk internal) (Table 1). The Btk internal primer set is specific to sites located between the translational start and stop codons while the Btk 5′UTR forward primer is located 22 bps upstream from the translational start codon, within the 5′UTR, and the downstream primer is located 379 bps downstream from the translational stop codon. Interestingly, while a product of the expected size was amplified from BT474 cDNA using the Btk internal forward primer with either of the reverse primers (SI FIG. 1 and data not shown), no product was amplified from BT474 cDNA when the Btk 5′UTR forward primer was used with either of the reverse primers (SI FIG. 1). The 5′UTR primer set did generate a product of the expected size from a positive control cDNA sample generated from Namalwa B-cells (SI FIG. 1 a), suggesting that amino-terminal sequence of the Btk transcript in BT474 cells differed from the published sequence.

BT474 breast cancer cells express an alternative form of the Btk message from an alternative promoter. To confirm the full length Btk product in BT474 cells, sequence information was obtained upstream from the published Btk start codon using rapid amplification of cDNA ends (5′RACE). A sequence alignment that included the first 395 nucleotides of the published Btk exon1 sequence (accession #U13399) and the 398 nucleotides obtained using 5′ RACE revealed that while the sequences were 100% identical from position 307 downstream, the sequence upstream of position 307 was non-homologous (FIG. 3 a).

The Btk sequence obtained from BT474 cells using 5′RACE is 100% identical to two entries in the genome database (Levy et al., 2007) (Griffiths-Jones, 2004) that were derived using an automated analysis for gene prediction program (GNOMON). The sequence is named Btk-cra-C (hereafter referred to as Btk-C) reflecting its status as an automated computational prediction rather than an experimentally verified gene message. In addition, the Btk-C sequence is 100% identical to two sequences in the expressed sequence tag (EST) database. The first of the two EST sequences was obtained from a human pheochromocytoma tissue sample (Yang, Y. et al. 2000, unpublished; accession #AV733045) and the second from a study seeking to identify putative alternative promoters of human genes from human peripheral blood mononuclear cells (PEBLM2), (Kimura et al., 2006).

The unique, first exon present in the Btk-C message is located 4,416 bps to the 3′ side of the first exon from the published Btk gene (Btk-cra-A, hereafter referred to as Btk-A) and 255 bps to the 5′ side of the ribosomal protein L36a (FIGS. 3 b & 3 c). Evidence of additional ESTs that are identical to the full-length Btk-C transcript, along with the identification of putative pol II promoter sites (Ref) and transcription factor binding sites including Ets, Ap2, AhR and HoxA7 (Matys et al., 2003), strongly suggests that the Btk-C transcript found in BT474 cells is driven by an alternate promoter located 4,416 bps to the 3′ side if the published Btk-A promoter. In addition, the first exons of the two transcripts must use different donor sites to yield the mature RNAs (FIG. 3 a-3 c).

Furthermore, translation of the Btk-C nucleotide sequence into amino acids revealed an additional 47 amino acids that when aligned to the BTK-A amino acid sequence is in frame with the BTK-A methionine start codon. Importantly, the additional 47 amino acid stretch present in the Btk-C mRNA from BT474 breast cancer cells contains two additional methionine codons, at nucleotide positions 241-243 and 265-267, respectively (FIG. 3 a), creating a putative elongated BTK protein (FIG. 4 a). Bioinformatic analyses (ExPASy Proteomics Tools) of the BTK-C additional amino acid sequence for conserved motifs including, cleavage sites, phosphorlation, N-glycosylation, or Nmyristoylation sites did not identify any putative functional roles. Although, neither of the novel translational start sites contain strongly conserved Kozak sequences, a transcription start site prediction program (Down and Hubbard, 2002) identified a putative transcription start site, located within a CpG island, 200 bps upsteam from the start of the Btk-C mRNA. Additionally, promoter prediction analyses (Knudsen, 1999) of 2500 nucleotides of the Btk-C sequence, located just upstream from the transcription start site (TSS), has predicted the presence of two putative promoters. The first is located 823 bps upstream from the Btk-C TSS and is predicted to be a highly likely promoter with a score of 1.156 and the second is located 22 nucleotides upstream from the Btk-C TSS and is predicted to be a promoter with marginal likeliness with a score of 0.699. A similar promoter prediction analyses using the 2500 nucleotides located just upstream from the Btk-A TSS has also predicted the presence of two putative promoters, but at greater distances from the TSS and with lower likeliness scores compared to either of the Btk-C predicted promoters. The first predicted Btk-A promoter is located 867 bps upstream from the Btk-A TSS and has a marginal likeliness score of 0.649 and the second is located 388 bps upstream from the Btk-A TSS with a marginal likeliness score of 0.569.

Consistent with the bioinformatic TSS and promoter prediction analyses of the Btk-C message, western blotting of BT474 total lysate using a BTK specific antibody detected a feint 80 kD sized product, the predicted size of the BTK-C protein if it was translated from the first of the two additional start codons (FIG. 4 b). The BTK-C protein levels are low, suggesting that transcription from the Btk-C promoter is weak. However, no BTK-A specific sized product was detected in BT474 cell lysate, although it was readily detected in B-cell lysate (FIG. 4 b), indicating that expression of the Btk-C transcript is distinct from Btk-A expression.

The Btk-C transcript encodes an alternative protein. To engineer more of the BTK-C protein, the Btk-C sequence beginning with the region corresponding to the new start codon and continuing to the stop codon was cloned into a Hygro-MarxIV over-expression vector (Hannon et al., 1999) containing a triple flag tag sequence (hereafter referred to as the Btk-C-flag vector). For comparison, the Btk-A sequence, beginning with the published start codon and continuing to the stop codon, was also cloned into the triple flag tag Hygro-MarxIV vector (hereafter referred to as the Btk-A-flag vector). 293FT cells were co-transfected with the Btk-A-flag vector or Btk-Cflag vector as well as either the Btk shRNA construct or a control shRNA construct. The 293FT cells containing the over-expressed Btk-A protein alone or with the control shRNA yielded a 79.5 KD molecular weight product; the predicted size of the Btk-A protein containing a triple flag tag (FIG. 4 c). The 293FT cells containing the over-expressed Btk-C protein alone or with the control shRNA, however, yielded two products. The smaller product is approximately the predicted molecular weight of the Btk-A protein containing a triple flag tag (79.5 KD) and the larger product is the predicted molecular weight of the Btk-C protein containing a triple flag tag if it were translated from the first of the two novel methionine codons (83 KD) (FIG. 4 c). Without intending to limit the invention in any embodiment by any theory as to how the embodiment works, Applicants believe that the most likely explanation for the two Btk-C products is that the first of the two additional methionine codons is being used as a translational start site as well as the original methionine start codon, which contains a good Kozak consensus sequence. Interestingly, data from western blotting, qPCR and 5′RACE (FIG. 4 b and data not shown) indicate that only the Btk-C transcript is present in BT474 cells, suggesting that the Btk-A promoter is not active in BT474 cells.

293FT cells stably overexpressing either the BTK-A or BTK-C proteins that were transfected with the shRNA targeting Btk contained significantly less cross reactive protein (FIG. 4 c). Additionally, the transient transfection of BT474 cells stably over-expressing Btk-C with siRNAs targeting Btk resulted in an approximate 70% decrease in Btk protein compared to control (FIG. 4 d). Taken together these results confirm that the Btk shRNA and siRNA are strong and specific effectors of Btk gene silencing.

To assess BTK activation in BT474 cells the phosphorylation status of tyrosine residue number 223, which becomes auto-phosphorylated after activation, was assessed. BT474 cells stably over-expressing either the Btk-A-flag or Btk-C-flag proteins were subjected to immunoprecipitation using a flag specific antibody and the immunoprecipitates were separated with SDS-PAGE electrophoresis. Blots were probed with an anti-phospho Tyr223-Btk antibody or a total Btk antibody to control for loading. The Btk-A protein was phosphorylated as well as both forms of the Btk-C proteins, indicating the Btk-C protein is activated in BT474 cells (FIG. 5 a). Addition of the Btk specific inhibitor LFM-A13 severely impeded phosphorylation of both the Btk-A and Btk-C proteins, indicating that autophosphorylation of the elongated Btk-C protein is inhibited to a similar level as Btk-A using the Btk-A specific inhibitor LFM-A13 (FIG. 5 a). Consistent with this, inhibiting Btk phosphorylation in BT474 cells using 25 uM LFM-A13 increased apoptosis levels by 8% compared to control, further establishing the protective role that Btk plays in BT474 cell survival (FIGS. 5 b & c). There was no significant change in growth rate or morphology in BT474 cells stably over-expressing the BTK-A or the BTK-C proteins compared to control cells under either standard conditions or serum-free conditions. Thus, although decreased expression results in apoptosis, Btk over-expression does not confer a growth advantage under these conditions (data not shown).

Btk is detected in BT474 cell cytoplasm using immunofluorescense. Immunofluorescent (IF) confocal images were taken of wild type BT474 cells, BT474 cells containing either a stably integrated control Hygro-MarxIV triple flag tag vector (hereafter referred to as control vector), the Btk-A-flag vector or the Btk-C flag vector. As was expected, no BTK specific signal was generated using a flag tag specific antibody in wt BT474 cells or cells stably over-expressing the control vector, but a signal was seen in the cytoplasm of both cell lines stably over-expressing either the BTK-Aflag or the BTK-C-flag proteins (FIG. 6 b). However, IF images taken of cells probed with a BTK specific antibody were positive for BTK in the cytoplasm of wtBT474 cells (FIG. 6 a) as well as in the cytoplasm of cells stably overexpressing the control vector (FIG. 6 b). As would be expected, cells stably overexpressing either the BTK-A or BTK-C proteins contained, noticeably more signal than control cells. The endogenous BTK protein was most likely more visible using immunofluorescent confocal imagery because certain antibodies are more amenable to immunofluorescent confocal imaging protocols compared to SDS-PAGE immunoblotting.

Btk-C is elevated in breast cancer cells. The data from western blotting, 5′RACE and RT-PCR suggested that the Btk-C message might be preferentially expressed in breast cancer cells compared to non-tumorigenic cells. To specifically amplify the Btk-A and Btk-C messages two distinct qPCR primer sets were designed to the unique region of the sequences located within the 5′UTRs, (Btk-A_(—)5′UTR and Btk-C_(—)5′UTR, respectively). cDNA from the Btk-A positive B-cell line (Namalwa), the breast cancer cell lines BT474, MCF7, and MDA-MB361, as well as the non-tumorigenic breast cell lines MCF 10a and HMEC, was amplified with each primer set using SYBR Green. A product was detected only for the Btk-A positive malignant B cell line Namalwa using the Btk-A specific primer set (data not shown). Products were detected in all breast cancer samples using the Btk-C specific primer set. The non-tumorigenic breast samples produced a signal inconsistently and at the last round of cycling, suggesting the transcript levels were at the limit of detection. Both the BT474 and MCF7 breast cancer cell lines had 4-fold more transcript compared to the either of the non-tumorigenic breast cell lines MCF10a and HMEC and the malignant B-cell line Namalwa (FIG. 7 a) These results indicate Btk-C expression is enhanced in breast cancer cells compared to non-tumorigenic breast cells. While not wishing to be bound by any theory of how embodiments of the invention work, this result raises the possibility that mis-expression of the Btk-C transcript, through use of an alternative promoter, may support the unregulated growth characteristic to malignant cells.

A search for Btk expression in clinical breast cancer tissues (Oncomine) revealed that Btk levels are elevated in forty three percent (seven total samples) of invasive ductal cancers compared to all fifteen matched, cancer-free breast tissue samples analyzed (Karnoub et al., 2007). In a second study that profiled the expression levels of 198 breast cancer samples, Btk expression was upregulated in 13% of tissue samples from patients with invasive ductal breast cancer (Desmedt et al., 2007). Although, the Affymetrix probes used to target the Btk gene in these studies does not discriminate between the Btk-A and the Btk-C forms, based upon our data we would predict that the Btk C form is being expressed in these clinical cancer samples. Consistent with the oncomine expression data, using a BTK specific antibody, BTK was detected in a clinical breast cancer tissue sample but not in a matched non-tumorigenic breast tissue sample (FIG. 7 b).

Using an unbiased RNAi approach to screen 91% of the human genomes PTKs, we have found that 29% of the total TKs examined strongly contributed to the proliferative potential of the breast cancer cell. Among these TKs, 54% were receptor TK's and 46% were non-receptor cytoplasmic tyrosine kinases. As expected, known survival kinases such as EGFR, ERBB2, FGFR2, LYN, PTK2B, NTRK2/TRK-B were identified in the screen. EGFR and ERBB-2 are known critical survival kinases and ErbB-2 is amplified and constitutively activated in the BT474 breast cancer cell line. ERBB2 has no known ligand but rather becomes activated through dimerization with other EGFR family members resulting in constitutive signaling cascades through PLCgamma, PI3K and RAS (Fig. pPLCg2 blot; (Serra et al., 2008); (Eckert et al., 2004).

Additionally, we have revealed previously unrecognized roles for members of the Eph family of receptor tyrosine kinases and Tec family of cytoplasmic tyrosine kinases in promoting breast cancer cell survival in this Erb-B2 positive breast cancer. Four of the 25 TKs that caused the greatest inhibition of BT474 cellular proliferation when knocked down were Eph receptor TKs (FIG. 1). Eph receptors and Eph ligands have been well studied for their role in neuronal development (Klein, 2004). Additional functions have been described including involvement in vascular development during embryogenesis, in cell to cell communication and in the regulation of cellular morphogenesis, adhesion and migration (Arvanitis and Davy, 2008); (Merlos-Suarez and Batlle, 2008) (Noren and Pasquale, 2004). Interestingly, Eph Receptors are also expressed on platelets and have been implicated in platelet aggregation at sites of vascular injury (Prevost et al., 2003).

The formation of breast carcinomas is accompanied by the recruitment of a “variety of stromal cells (such as MSCs) with both pro-and anti-tumorigenic activities” (Karnoub et al., 2007); (Bissell and Radisky, 2001). The response is similar to wound healing and scar formation, and involves the continuous deposition of growth factors, cytokines and matrix-remodeling proteins, such that a tumor site is like a ‘wound that never heals’ (Park et al., 2000). Similarly, both sites of vascular injury and sites of tumor initiation lead to the formation of thrombus; the process by which collagen or thrombin activate freely circulating platelets, leading to their adherence at the injured wall and then to each other, resulting in the formation of a fibril clot (Prevost et al., 2005).

The Eph kinase receptors EphA4 and EphB1 are expressed on platelets (Prevost et al., 2005) and Eph receptor interaction with ligand promotes adhesion and aggregation, at sites of vascular injury, in a Ras family member, Rapl, at least partially, dependent event (Prevost et al., 2005); (Prevost et al., 2004). Furthermore, Eph receptors are known to associate with Src family tyrosine kinases and to signal through cytoplasmic tyrosine kinases (Kullander and Klein, 2002). Following platelet activation EphA4 becomes associated with the Src family cytoplasmic TKs Lyn and Fyn and may promote the phosphorylation of integrin B3 (Prevost et al., 2002). Lyn is another TK integral to B cell receptor signaling and, when knocked down, led to a significant decrease in BT474 cellular proliferation (FIG. 1). Without wishing to suggest that embodiments of the invention work according to any particular mechanism or theory, it is interesting that EGF promotes wound healing (Hardwicke et al., 2008), suggesting a potentially cooperative or shared signaling pathway exists for these receptor/ligands (Lo et al., 2006). Further studies will need to be conducted to determine if EGF and/or EGF receptor family members cooperate with the Eph receptors to promote breast cancer cell survival.

In addition, we found that knockdown of four of the five Tec family member kinases resulted in reduced BT474 proliferation (FIG. 1). The Tec kinases are known primarily for their roles in immune development and function. Yet, further evaluation of one family member, BTK, led to the discovery of novel protein containing an amino-terminal extension and two additional start codons. A search of the EST database using sequence specific to the Btk-C transcript retrieved two identical EST sequences verifying that this gene is actively transcribed from an alternative promoter five thousand nucleotides downstream from the Btk-A promoter. Applicants will not be bound by any theory of how embodiments of the invention work. However, expression of Btk-C but not Btk-A in a number of breast cancer cells but not in non-tumorigenic cells suggests that deregulation of the promoter is responsible for its expression in the cancer cells. In support of this notion, Btk levels are elevated in several ductal carcinoma tissue samples compared to all normal breast tissue samples analyzed in a study represented in the cancer gene expression database, Oncomine (Karnoub, 2007, Nature). Further experiments will need to be done to determine if the Btk-C variant is in fact the form that is elevated in these breast carcinomas.

A number of recent papers provide data that is consistent with our results implicating hematopoietic associated cytoplasmic TKs in critical functions in solid tumors. ABL2 is a cytoplasmic TK, highly similar to the Src and Tec family of cytoplasmic TKS, whose constitutive activation, generated through chromosomal translocation into breakpoint cluster regions (BCR-Abl) and Tel genes (Tel-Abl), (Advani and Pendergast, 2002) causes various forms of leukemia and myeloproliferative diseases (Tefferi and Gilliland, 2007). Recently, however, Abl has been implicated, for the first time, in breast cancer cell pathogenic processes (Srinivasan and Plattner, 2006). Abl was found to be constitutively active downstream of deregulated ErbB receptors and Src family tyrosine kinases in highly invasive breast cancer cell lines (Srinivasan and Plattner, 2006) (Srinivasan et al., 2008).

In PTEN negative prostate cancer cell lines, LNCaP and PC3, knockdown of the Tec family kinase BMX using siRNAs caused suppression of cell growth. BMX was found to be activated by the ErbB2/ErbB3 receptors and the EGF receptor in a PI3-K dependent and independent manner, respectively. An interaction was identified between BMX and ErbB-3 using immunoprecipitation and immunoblotting. Furthermore, the cytoplasmic tyrosine kinase Src was shown to be responsible for the phosphorylation of BMX prior to membrane recruitment as a Src inhibitor blocked its activation. The authors propose that BMX has a role in integrating the PI3-K and ErbB2/ErbB3 signaling pathways (Jiang et al., 2007). We also noted cell death of BT474 cells after Src knockdown (data not shown), suggesting (without wishing to be bound by theory or hypothesis) that Btk and the other Tec family tyrosine kinases may serve a similar function in ErbB-2 positive breast cancers.

To determine how the RANK and Immune Receptor (ITAMs) signaling pathways converge to promote osteoclast differentiation a genome-wide screen of the nonreceptor tyrosine kinases revealed that osteoclasts, but not osteoblasts, express high levels of Btk and Tec. Osteoclasts are derived from bone marrow cells and are under the control of the immune system. The authors conclude (without intending that embodiments of the invention must work according to the hypothesis) that RANKL stimulates the Btk and Tec kinases to form a signaling complex with other molecules, such as the adaptor protein BLNK and the tyrosine kinase Syk, which leads to PLCgamma phosphorylation and the induction of calcium signaling essential for osteoclastogenesis (Shinohara et al., 2008).

Do recruited cell types, such as mesenchymal stem cells, associate with primary tumor cells in such a way to stimulate Btk and other Tec family tyrosine kinases, leading to a convergence of signaling pathways that favor cell survival? Applicants pose this question without admitting in any way that embodiments of the invention work according to the hypothesis implied by the question. In any event, the EGF and BCR signaling complexes and regulated downstream signaling pathways are remarkably similar (Donjerkovic and Scott, 2000); (Lo et al., 2006). Both involve signaling through PLCgamma, P13K and RAS with the resultant calcium flux and subsequent activation of MAPK/JNK. PLCgamma activation leads to the hydrolysis of PIP2 and the production of DG and IP3. DG induces/phosphorylates PKC leading to the activation of ELK1 and IP3 leads to calcium mobilization. The resulting cellular message is pro-survival. An RNAi screen conducted to identify tyrosine kinases and phosphotases that would sensitize chemoresistant cancer cells to apoptosis found a number of calcium-regulated kinases (CaMK1g, CaMKIINa, CaMKIIB and CaMKIId) to be potent survival kinases (MacKeigan et al., 2005), suggesting that kinases that regulate calcium flux may be important therapeutic targets.

REFERENCES

-   Adnane, J., Gaudray, P., Dionne, C.A., Crumley, G., Jaye, M.,     Schlessinger, J., Jeanteur, P., Birnbaum, D. and Theillet, C. (1991)     BEK and FLG, two receptors to members of the FGF family, are     amplified in subsets of human breast cancers. Oncogene, 6, 659-663. -   Advani, A. S. and Pendergast, A. M. (2002) Bcr-Abl variants:     biological and clinical aspects. Leuk Res, 26, 713-720. -   Arvanitis, D. and Davy, A. (2008) Eph/ephrin signaling: networks.     Genes Dev, 22, 416-429. -   Bissell, M. J. and Radisky, D. (2001) Putting tumours in context.     Nat Rev Cancer, 1, 46-54. -   Blume-Jensen, P. and Hunter, T. (2001) Oncogenic kinase signalling.     Nature, 411, 355-365. -   Brantley-Sieders, D. M., Fang, W. B., Hicks, D. J., Zhuang, G.,     Shyr, Y. and Chen, J. (2005) Impaired tumor microenvironment in     EphA2-deficient mice inhibits tumor angiogenesis and metastatic     progression. Faseb J, 19, 1884-1886. -   Call, J. A., Eckhardt, S. G. and Camidge, D. R. (2008) Targeted     manipulation of apoptosis in cancer treatment. Lancet Oncol. -   Cameron, H. L. and Foster, W. G. (2008) Dieldrin promotes resistance     to anoikis in breast cancer cells in vitro. Reprod Toxicol, 25,     256-262. -   Cha, J. Y., Lambert, Q. T., Reuther, G. W. and Der, C. J. (2008)     Involvement of fibroblast growth factor receptor 2 isoform switching     in mammary oncogenesis. Mol Cancer Res, 6, 435-445. -   Desmedt, C., Piette, F., Loi, S., Wang, Y., Lallemand, F.,     Haibe-Kains, B., Viale, G., Delorenzi, M., Zhang, Y.,     d'Assignies, M. S., Bergh, J., Lidereau, R., Ellis, P., Harris, A.     L., Klijn, J. G., Foekens, J. A., Cardoso, F., Piccart, M. J.,     Buyse, M. and Sotiriou, C. (2007) Strong time dependence of the     76gene prognostic signature for node-negative breast cancer patients     in the TRANSBIG multicenter independent validation series. Clin     Cancer Res, 13, 3207-3214. -   Donjerkovic, D. and Scott, D. W. (2000) Activation-induced cell     death in B lymphocytes. Cell Res, 10, 179-192. -   Down, T. A. and Hubbard, T. J. (2002) Computational detection and     location of transcription start sites in mammalian genomic DNA.     Genome Res, 12, 458-461. -   Eckert, L. B., Repasky, G. A., Ulku, A. S., McFall, A., Zhou, H.,     Sartor, C. I. and Der, C. J. (2004) Involvement of Ras activation in     human breast cancer cell signaling, invasion, and anoikis. Cancer     Res, 64, 4585-4592. -   Griffiths-Jones, S. (2004) The microRNA Registry. Nucleic Acids Res,     32, D109-111. -   Hannon, G. J., Sun, P., Carnero, A., Xie, L. Y., Maestro, R.,     Conklin, D. S. and Beach, D. (1999) MaRX: an approach to genetics in     mammalian cells. Science, 283, 1129-1130. -   Hardwicke, J., Schmaljohann, D., Boyce, D. and Thomas, D. (2008)     Epidermal growth factor therapy and wound healing—past, present and     future perspectives. Surgeon, 6, 172-177. -   Hofmann, W. K., de Vos, S., Elashoff, D., Gschaidmeier, H., Hoelzer,     D., Koeffler, H. P. and Ottmann, O. G. (2002) Relation between     resistance of Philadelphia-chromosome-positive acute lymphoblastic     leukaemia to the tyrosine kinase inhibitor STI571 and     gene-expression profiles: a gene-expression study. Lancet, 359,     481-486. -   Hunter, D. J., Kraft, P., Jacobs, K. B., Cox, D. G., Yeager, M.,     Hankinson, S. E., Wacholder, S., Wang, Z., Welch, R., Hutchinson,     A., Wang, J., Yu, K., Chatterjee, N., On, N., Willett, W. C.,     Colditz, G. A., Ziegler, R. G., Berg, C. D., Buys, S. S.,     McCarty, C. A., Feigelson, H. S., Calle, E. E., Thun, M. J.,     Hayes, R. B., Tucker, M., Gerhard, D. S., Fraumeni, J. F., Jr.,     Hoover, R. N., Thomas, G. and Chanock, S. J. (2007) A genome-wide     association study identifies alleles in FGFR2 associated with risk     of sporadic postmenopausal breast cancer. Nat Genet, 39, 870-874. -   Irish, J. M., Czerwinski, D. K., Nolan, G. P. and Levy, R. (2006)     Altered B-cell receptor signaling kinetics distinguish human     follicular lymphoma B cells from tumor-infiltrating nonmalignant B     cells. Blood, 108, 3135-3142. -   Jiang, X., Borgesi, R. A., McKnight, N. C., Kaur, R.,     Carpenter, C. L. and Balk, S. P. (2007) Activation of nonreceptor     tyrosine kinase Bmx/Etk mediated by phosphoinositide 3-kinase,     epidermal growth factor receptor, and ErbB3 in prostate cancer     cells. J Biol Chem, 282, 32689-32698. -   Karnoub, A. E., Dash, A. B., Vo, A. P., Sullivan, A., Brooks, M. W.,     Bell, G. W., Richardson, A. L., Polyak, K., Tubo, R. and     Weinberg, R. A. (2007) Mesenchymal stem cells within tumour stroma     promote breast cancer metastasis. Nature, 449, 557-563. -   Kimura, K., Wakamatsu, A., Suzuki, Y., Ota, T., Nishikawa, T.,     Yamashita, R., Yamamoto, J., Sekine, M., Tsuritani, K., Wakaguri,     H., Ishii, S., Sugiyama, T., Saito, K., Isono, Y., Irie, R.,     Kushida, N., Yoneyama, T., Otsuka, R., Kanda, K., Yokoi, T., Kondo,     H., Wagatsuma, M., Murakawa, K., Ishida, S., Ishibashi, T.,     Takahashi-Fujii, A., Tanase, T., Nagai, K., Kikuchi, H., Nakai, K.,     Isogai, T. and Sugano, S. (2006) Diversification of transcriptional     modulation: large-scale identification and characterization of     putative alternative promoters of human genes. Genome Res, 16,     55-65. -   Klein, R. (2004) Eph/ephrin signaling in morphogenesis, neural     development and plasticity. Curr Opin Cell Biol, 16, 580-589. -   Knudsen, S. (1999) Promoter2.0: for the recognition of PolII     promoter sequences. Bioinformatics, 15, 356-361. -   Kourtidis, A., Eifert, C. and Conklin, D. S. (2007) RNAi     applications in target validation. Ernst Schering Res Found     Workshop, 1-21. -   Kullander, K. and Klein, R. (2002) Mechanisms and functions of Eph     and ephrin signalling. Nat Rev Mol Cell Biol, 3, 475-486. -   Kuppers, R. (2005) Mechanisms of B-cell lymphoma pathogenesis. Nat     Rev Cancer, 5, 251-262. -   Levy, S., Sutton, G., Ng, P. C., Feuk, L., Halpern, A. L.,     Walenz, B. P., Axelrod, N., Huang, J., Kirkness, E. F., Denisov, G.,     Lin, Y., MacDonald, J. R., Pang, A. W., Shago, M., Stockwell, T. B.,     Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S. A., Busam, D. A.,     Beeson, K. Y., McIntosh, T. C., Remington, K. A., Abril, J. F.,     Gill, J., Borman, J., Rogers, Y. H., Frazier, M. E., Scherer, S. W.,     Strausberg, R. L. and Venter, J. C. (2007) The diploid genome     sequence of an individual human. PLoS Biol, 5, e254. -   Lindvall, J. M., Blomberg, K. E., Valiaho, J., Vargas, L.,     Heinonen, J. E., Berglof, A., Mohamed, A. J., Nore, B. F.,     Vihinen, M. and Smith, C. I. (2005) Bruton's tyrosine kinase: cell     biology, sequence conservation, mutation spectrum, siRNA     modifications, and expression profiling. Immunol Rev, 203, 200-215. -   Lo, H. W., Hsu, S. C. and Hung, M. C. (2006) EGFR signaling pathway     in breast cancers: from traditional signal transduction to direct     nuclear translocalization. Breast Cancer Res Treat, 95, 211-218. -   Lu, J. and Chu, D. (2008) Novel Therapies in Breast Cancer: What is     New from ASCO 2008. J Hematol Oncol, 1, 16. -   MacKeigan, J. P., Murphy, L. O. and Blenis, J. (2005) Sensitized     RNAi screen of human kinases and phosphatases identifies new     regulators of apoptosis and chemoresistance. Nat Cell Biol, 7,     591-600. -   Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M.,     Hehl, R., Hornischer, K., Karas, D., Kel, A. E., Kel-Margoulis, O.     V., Kloos, D. U., Land, S., Lewicki-Potapov, B., Michael, H., Munch,     R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S. and     Wingender, E. (2003) TRANSFAC: transcriptional regulation, from     patterns to profiles. Nucleic Acids Res, 31, 374-378. -   Merlos-Suarez, A. and Batlle, E. (2008) Eph-ephrin signalling in     adult tissues and cancer. Curr Opin Cell Biol, 20, 194-200. -   Nahta, R. and Esteva, F. J. (2006) HER2 therapy: molecular     mechanisms of trastuzumab resistance. Breast Cancer Res, 8, 215. -   Noren, N. K. and Pasquale, E. B. (2004) Eph receptor-ephrin     bidirectional signals that target Ras and Rho proteins. Cell Signal,     16, 655-666. -   Ogawa, K., Pasqualini, R., Lindberg, R. A., Kain, R., Freeman, A. L.     and Pasquale, E. B. (2000) The ephrinAl ligand and its receptor,     EphA2, are expressed during tumor neovascularization. Oncogene, 19,     6043-6052. -   Paddison, P. J., Silva, J. M., Conklin, D. S., Schlabach, M., Li,     M., Aruleba, S., Balija, V., O'Shaughnessy, A., Gnoj, L., Scobie,     K., Chang, K., Westbrook, T., Cleary, M., Sachidanandam, R.,     McCombie, W. R., Elledge, S. J. and Hannon, G. J. (2004) A resource     for large-scale RNA-interference-based screens in mammals. Nature,     428, 427-431. -   Park, C. C., Bissell, M. J. and Barcellos-Hoff, M. H. (2000) The     influence of the microenvironment on the malignant phenotype. Mol     Med Today, 6, 324-329. -   Penault-Llorca, F., Bertucci, F., Adelaide, J., Parc, P., Coulier,     F., Jacquemier, J., Birnbaum, D. and deLapeyriere, O. (1995)     Expression of FGF and FGF receptor genes in human breast cancer. Int     J Cancer, 61, 170-176. -   Prevost, N., Woulfe, D., Tanaka, T. and Brass, L. F. (2002)     Interactions between Eph kinases and ephrins provide a mechanism to     support platelet aggregation once cell-to-cell contact has occurred.     Proc Natl Acad Sci USA, 99, 9219-9224. -   Prevost, N., Woulfe, D., Tognolini, M. and Brass, L. F. (2003)     Contact-dependent signaling during the late events of platelet     activation. J Thromb Haemost, 1, 1613-1627. -   Prevost, N., Woulfe, D. S., Jiang, H., Stalker, T. J., Marchese, P.,     Ruggeri, Z. M. and Brass, L. F. (2005) Eph kinases and ephrins     support thrombus growth and stability by regulating integrin     outside-in signaling in platelets. Proc Natl Acad Sci USA, 102,     9820-9825. -   Prevost, N., Woulfe, D. S., Tognolini, M., Tanaka, T., Jian, W.,     Fortna, R. R., Jiang, H. and Brass, L. F. (2004) Signaling by     ephrinB1 and Eph kinases in platelets promotes Rapl activation,     platelet adhesion, and aggregation via effector pathways that do not     require phosphorylation of ephrinB1. Blood, 103, 1348-1355. -   Qiu, Y. and Kung, H. J. (2000) Signaling network of the Btk family     kinases. Oncogene, 19, 5651-5661. -   Sabbah, M., Emami, S., Redeuilh, G., Julien, S., Prevost, G.,     Zimber, A., Ouelaa, R., Bracke, M., De Wever, O. and     Gespach, C. (2008) Molecular signature and therapeutic perspective     of the epithelial-to-mesenchymal transitions in epithelial cancers.     Drug Resist Updat. -   Serra, V., Markman, B., Scaltriti, M., Eichhorn, P. J., Valero, V.,     Guzman, M., Botero, M. L., Llonch, E., Atzori, F., Di Cosimo, S.,     Maira, M., Garcia-Echeverria, C., Parra, J. L., Arribas, J. and     Baselga, J. (2008) NVP-BEZ235, a dual PI3K/mTOR inhibitor, prevents     PI3K signaling and inhibits the growth of cancer cells with     activating PI3K mutations. Cancer Res, 68, 8022-8030. -   Shinohara, M., Koga, T., Okamoto, K., Sakaguchi, S., Arai, K.,     Yasuda, H., Takai, T., Kodama, T., Mono, T., Geha, R. S., Kitamura,     D., Kurosaki, T., Ellmeier, W. and Takayanagi, H. (2008) Tyrosine     kinases Btk and Tec regulate osteoclast differentiation by linking     RANK and ITAM signals. Cell, 132, 794-806. -   Srinivasan, D. and Plattner, R. (2006) Activation of Abl tyrosine     kinases promotes invasion of aggressive breast cancer cells. Cancer     Res, 66, 5648-5655. -   Srinivasan, D., Sims, J. T. and Plattner, R. (2008) Aggressive     breast cancer cells are dependent on activated Abl kinases for     proliferation, anchorage-independent growth and survival. Oncogene,     27, 1095-1105. -   Tefferi, A. and Gilliland, D. G. (2007) Oncogenes in     myeloproliferative disorders. Cell Cycle, 6, 550-566. -   Tsukada, S., Saffran, D. C., Rawlings, D. J., Parolini, O.,     Allen, R. C., Klisak, I., Sparkes, R. S., Kubagawa, H., Mohandas,     T., Quan, S. and et al. (1993) Deficient expression of a B cell     cytoplasmic tyrosine kinase in human X-linked agammaglobulinemia.     Cell, 72, 279-290. -   Vassilev, A., Ozer, Z., Navara, C., Mahajan, S. and     Uckun, F. M. (1999) Bruton's tyrosine kinase as an inhibitor of the     Fas/CD95 death-inducing signaling complex. J Biol Chem, 274,     1646-1656. -   Villuendas, R., Steegmann, J. L., Pollan, M., Tracey, L., Granda,     A., Fernandez-Ruiz, E., Casado, L. F., Martinez, J., Martinez, P.,     Lombardia, L., Villalon, L., Odriozola, J. and Piris, M. A. (2006)     Identification of genes involved in imatinib resistance in CML: a     gene-expression profiling approach. Leukemia, 20, 1047-1054.

Experimental

Cell culture Breast cancer cell lines BT474, MCF-7, MDA-MB-361, were obtained from ATCC. The Burkitt's Lymphoma cell line NAMALWA was obtained from ATCC. Human mammary epithelial cells (HMEC) were obtained from Cambrex. HEK 293FT cells were obtained from Invitrogen. MCF10a were obtained from ??? BT474, MCF-7 and HEK 293FT cells were cultured in DMEM (Hyclone) supplemented with 10% FBS (Hyclone) and 100 U/μl of penicillin-streptomycin (Cellgro). NAMALWA were cultured in RPMI-1640 medium (ATCC) supplemented with 10% FBS (Hyclone) and 100 U/ul of penicillin-streptomycin. MDA-MB-361 were cultured in RPMI-1640 medium (ATCC) supplemented with 20% FBS and 100 U/μl of penicillin-streptomycin. HMECs were cultured in MEGM medium (Cambrex). MCF10a were cultured in DME/F12 1:1 medium supplemented with 5% Horse serum, 20 ng/ml EGF, 0.5 ug/ml hydrocortisone, 100 ng/ml cholera toxin, 10 ug/ml insulin, and 100 U/ul of penicillin-streptomycin.

Reagents. The polyclonal anti-BTK antibody (C-20), the monoclonal anti-BTK antibody (E-9) and the polyclonal anti-GAPDH antibody (V-18) were obtained from Santa Cruz Biotechnology (Santa Cruz, Calif., USA). The polyclonal anti-BTK antibody used for immunofluorescence was obtained from ProSci Incorporated (Poway, Calif., USA). The monoclonal anti-FLAG M2 antibody was from Stratagene (Cedar Creek, Tex., USA). The polyclonal anti-caspase-3 antibody (Asp175) was obtained from Cell Signaling Technology (Danvers, Mass., USA). The protease inhibitor cocktail was obtained from Roche (Indianapolis, Ind., USA) and the phosphatase inhibitor cocktail was from Pierce (Rockford, Ill., USA).

Construction of the PTK shRNA library. A collection of short hairpin RNAs (shRNAs) targeting each of the PTKs were assembled from the pShagMagic2 (pSM2) shRNA library (Paddison et al., 2004). pSM2 shRNAs are modeled after pre-miRNAs and are transcribed by a pol III type promoter (U6) in a retroviral backbone. These vectors can be used to transfect cells to induce transient gene knockdown or they can be used to generate virus capable of infecting cells for long-term, continuous hairpin expression. In most cases, multiple shRNA constructs target the same gene, such that, over 300 shRNA clones were selected from the library to transfect into BT474 cells. Plasmid DNA was isolated, in 96-well format, from bacterial stocks containing each of the shRNAs using the Perfectprep Plasmid 96 Vac Direct Bind kit (Eppendorf, Hamburg, Germany).

RNAi screen—transfections. shRNA constructs were expressed from the pSHAGMAGIC 2 (pSM2) vector and derived from a genome-wide shRNA library (31). ShRNAs targeting the firefly (Photinus pyralis) luciferase gene were used as controls. Transfection efficiency was monitored by co-transfection with a modified MSCV-Puro vector expressing green fluorescent protein (GFP). The alamarBlue (Biosource) assay was performed 96 h post-transfection, since BT474 cells have a population doubling time of ˜100 hours. Mature sequences of the shRNAs that produced the best results on decreasing BT474 viability are given in table 51; a complete list is available in the RNAi Codex web page (http://codex.cshl.edu). The shRNAs targeting the luciferase gene were constructed as described in the RNAi Codex web page (http://codex.cshl.edu/scripts/newmain.pl) using a modified pSM2 vector containing the PheS gene (pSM2-PheS) in the cloning site, as a negative selection marker. Transfections were performed using FuGENE 6 (Roche) according to the manufacturer's protocol. High-throughput transfections were performed using an EpMotion 5070 fluidics station (Eppendorf). For quantification of alamarBlue we used a BioTek HT Synergy plate reader.

Cell viability—apoptosis assays. For high-throughput experiments, cells grown on 96-well plates were washed once with 1× PBS, fixed with 2.5% formaldehyde and stained with Hoechst 33342 (Molecular Probes-Invitrogen). Pictures of cells were acquired using an In Cell Analyzer 1000 (GE Healthcare) high content imaging system, with a 20× objective. At least 30 fields were imaged per single experiment. Cell counts and statistics were then performed using the In Cell Investigator 3.4 high-content image analysis software (GE Healthcare). Apoptosis was detected by cleaved Caspase-3 after 48 h to 96 h of shRNA treatments. In this case, cells were fixed after treatment with 2.5% formaldehyde, washed with 1× PBS, permeabilized with 0.1% Triton-X 100 (Fisher Chemicals), blocked with 3% normal goat serum (Sigma-Aldrich), incubated with a 1:50-1:200 dilution of the primary antibody, washed with 1× PBS, incubated with a 1:800 dilution of the secondary antibody, washed again with 1× PBS and finally stained with Hoechst 33342 (Molecular Probes-Invitrogen). Cells were imaged by the In Cell Analyzer 1000 (GE Healthcare) or by a Leica TCS SP5 confocal microscope system (Leica Microsystems). At least 500 cells were counted for cleaved Caspase-3. Antibodies used: cleaved Caspase-3 (Asp175, #9661; Cell Signaling Technology), Alexa Fluor 568 goat anti-rabbit IgG (#A-11011; Invitrogen) and Alexa Fluor 568 goat anti-mouse IgG (#A-11004; Invitrogen).

Immunoblotting. Cell extracts for western blots were obtained using RIPA buffer (1% Triton X-100, 40 mM NaCl, 0.1% SDS, 10 mM Tris pH 8.0) or non-denaturing lysis buffer: (20 mM Tris (pH 8.0), 137 mM NaCl, 10% glycerol, 1% Triton X-100, 2 mM EDTA), supplemented with complete cocktail of proteinase inhibitors (Roche). For detection of phoshporylated epitopes, the PhosSTOP cocktail of phospatase inhibitors (Roche) was added in the lysis buffer. Protein extracts were separated by SDS-PAGE, transferred to Immobilon-P (Millipore) membranes and immunoblotted according to standard protocols. Blots were imaged using a FluorChem HD (Alpha Innotech) imaging system. Antibodies used: GAPDH (V-18; Santa Cruz Biotechnology), anti-rabbit IgG-HRP (sc2204, Santa Cruz Biotechnology), antigoat IgG-HRP (sc-2768, Santa Cruz Biotechnology), anti-mouse IgG-HRP (#31430; Pierce Biotechnology)

RNA isolation, quantitative polymerase chain reaction (qPCR), TaqMan-qPCR, RTPCR. Total RNA was extracted from cells using TRizol (Invitrogen) according to the manufacturer's instructions, followed by the addition of DNaseI (Roche) for 20 min at 37° C. and purified using the RNeasy column (Qiagen, Valencia, Calif., USA) cleanup protocol. The cDNA was amplified using a modified version of the NIH/NCI Reid Lab cDNA synthesis protocol: A mixture containing 2-3.5 ug tRNA, 3 ul oligodT (Promega, Madison, Wis., USA) and 0.5 mM dNTPs was incubated at 650C for 5 minutes. Following incubation, 1× M-MLV reverse transcriptase buffer (Promega), and 256 U RNase Inhibitor (Fisher Scientific, Pittsburgh, Pa., USA), were added to the tRNA mixture and was incubated at 420C for 1 minute. 200 u of M-MLV reverse transcriptase (Promega) was then added to the tRNA mixture and incubated at 420C for 1 hour, followed by a 15 minute incubation at 700C to inactivate the M-MLV enzyme. qPCR reactions using SYBR Green Master Mix (Applied Biosystems) or Taq SYBR Green Super Mix (BioRad), were performed on a ABI PRISM 7900HT Sequence Detectin System (Applied Biosystems). TaqMan qPCR were performed using TaqMan Gene Expression Master Mix (Applied Biosystems) on a ABI PRISM 7900HT Sequence Detectin System (Applied Biosystems). The primer pairs used were designed using ABI's Primer Express software and are shown in table 1. After the initial denaturation step (95° C. for 2.5 min), PCR reactions consisted of 40 cycles of a 95° C.-15 sec step, and a 60° C.-1 min step. Analysis was conducted using ABsystems Real-Time Analysis software. For RT-PCR an amplification mix consisting of 1× Taq polymerase buffer (Fisher), 0.2 mM dNTPs, 0.2 uM Fwd Primer, 0.2 uM Rvs Primer (Table 1), 1/10th total volume cDNA, and 5 u Taq polymerase (Fisher).

Small interfering RNA methodology. Btk was knocked down using the siGEMOME SMART pool duplex (Dharmacon, Lafayette, Colo., USA) transfected with Oligofectamine Reagent (Invitrogen, Gaithersburg, Md., USA) according to the manufacturer's instructions.

MarxIV triple flag tag vector construction. The flag tag specific sequence was amplified using the following reaction conditions; 100 ng of the pCMV-3Tag-3a Vector (Stratagene) as template, 1× Taq polymerase buffer (Fisher), 0.2 mM dNTPs, 0.2 uM each of the pCMV-3× Flag Fwd and Rvs primers (Table 1), and 5 u Taq polymerase (Fisher). The PCR products were purified using spin columns (LPS inc.). The PCR DNA as well as the MarxIV vector DNA were double digested with 10 u ApaI and 10 u XhoI restriction endonucleases (NEB) in 1× NEB buffer #4 containing 1× BSA. The digested DNA was run on a 2% agarose gel and the desired DNA fragments were cut out and purified using the GeneClean Turbo kit (Qbiogene), according to the manufacturers specifications. The double digested PCR product was ligated into the double digested MarxIV vector in 1× Ligase buffer (NEB), with 400 U Ligase. The ligase mix was transformed in 5-alpha competent E. coli cells (NEB) and plated on LB plates containing 100 ug/mL ampicillin (Amp). Colonies were picked and grown in LB+100 ug/mL Amp overnight for plasmid DNA preparations. Plasmid DNA was double digested with 10 u Xho and 10 u ApaI and run on a 2% agarose gel to determine which colonies contain the MarxIV vector with the integrated triple flag tag sequence.

Construction of the Btk-A and Btk-C MarxIV and MarxIV triple flag tag vectors. The Btk-A sequence was amplified using Namalwa 10 ul cDNA as a template with the Btk-Flag primer set (Table 1). The Btk-C sequence was amplified using over-lap extension PCR (OLE). The first amplification reaction used 10 ng pCR2.1-TOPO plasmid DNA (Invitrogen) containing the amino terminus sequence of Btk-C (constructed in the 5′RACE experiment). The amplification reaction conditions were 0.5 mM of each of the N-term-Btk-C primers (Table 1), 1× Phusion DNA polymerase buffer (NEB), 0.2 mM dNTPs and 2 u Phusion DNA polymerase (NEB). The second amplification reaction amplified the C-terminal portion of the Btk-C gene using similar conditions as above except rather than plasmid DNA, Namalwa cDNA was used as template and the Btk-C-term primer set (Table 1) was used for amplification. PCR products resulting from these two amplification reactions were purified using Uprep Spin columns (LPS inc.) and both were subsequently used in a third amplification reaction to generate a PCR product of the complete Btk-C sequence. The amplification reaction conditions were N-term PCR product, C-term PCR product, 0.5 mM each of the Btk-C-Flag primer set, 1× Phusion DNA polymerase buffer, 0.2 mM dNTPs and 2 u Phusion DNA polymerase. The full-length Btk-C PCR product as well as the MarxIV triple flag vector DNA were Uprep column purified (LPS inc.) and double-digested using 10 u BamHI and 10 u XhoI restriction endonucleases. The Btk-A PCR product was double-digested using 10 u MfeI and 10 u XhoI restriction endonucleases. The digested DNA was run on a 1% agarose gel and the desired DNA fragments were cut out and purified using the GeneClean Turbo kit (Qbiogene), according to the manufacturers specifications. Each of the double-digested Btk-A and Btk-C PCR products were ligated into the double-digested MarxIV triple flag tag vector using 1× Ligase buffer (NEB), with 400 U Ligase. The ligase mix was transformed in competent E.coli cells and plated on LB plates containing ampicillin (100 ug/mL Amp). Colonies were picked and grown in LB+100 ug/mL Amp overnight for plasmid DNA preparations. Plasmid DNA was double digested with 10 u Xho and 10 u BamHI (Btk-C) insert or 10 u XhoI and 10 u MfeI (Btk-A) insert and run on a 1% agarose gel to determine which colonies contained the Btk-A or Btk-C DNA fragment within the MarxIV triple flag tag vector. Selected clones were sequence verified.

Stable infections and selection. BT474 cells that overexpress MarxIV-Flag, Btk-A-MarxIV-Flag or Btk-C-MarxIV-Flag were selected with 75 ug/ml Hygromycin B (Roche Diagnostics) for 10 days after infection with retrovirus produced by Phoenix A cells,transfected with either the MarxIV-Flag, Btk-A-Flag or Btk-C-Flag.

5′RACE. Total RNA was extracted from BT474 cells using TRizol (Invitrogen), according to the manufacturer's instructions. The tRNA was then incubated with DNase1 (Roche) for 20 min at 37° C., and then purified using RNeasy (Qiagen, Valencia, Calif., USA) RNA Cleanup protocol. The GeneRacer Kit (Invitrogen) was used, according to the manufacturers specifications, for amplification of the N-terminal portion of the Btk message. Briefly, the 5′ Cap was removed from full-length mRNAs. The GeneRacer Oligo was ligated to the message RNAs (mRNAs). The mRNA was reverse transcribed into cDNA. The Btk specific transcript was amplified in a first round of amplification using the GeneRacer 5′ Primer (complementary to the GeneRacer Oligo sequence) and the Btk-RACE Primer (Table 1). In a second round of amplification 1 ul of the product from the first amplification reaction was used as substrate with the Btk-RACE-Nest 3′ primer and the GeneRacer 5′ Nested primer (Table 1). The product was gel extracted and ligated into the PCR2.1-TOPO vector. The inserted DNA fragment was sequence verified. 

1. A method of treating cancer, comprising: a. providing a subject with breast cancer cells and an inhibitor of a gene encoding a cytoplasmic tyrosine kinase; b. treating said subject with said inhibitor.
 2. The method of claim 1, wherein said cytoplasmic tyrosine kinase is a member of the Tec family of cytoplasmic tyrosine kinases.
 3. The method of claim 2, wherein said cytoplasmic tyrosine kinase is Bruton's Tyrosine Kinase.
 4. The method of claim 2, wherein said cytoplasmic tyrosine kinase is a variant of Bruton's Tyrosine Kinase comprising an amino-terminal extension.
 5. The method of claim 4, wherein said extension comprises an additional 34 amino acids.
 6. The method of claim 1, wherein said inhibitor comprises an interfering RNA.
 7. The method of claim 6, wherein treating with said RNA results in reduced proliferation of said breast cancer cells.
 8. A method of diagnosing cancer, comprising: a. providing cells suspected to be breast cancer cells and a ligand capable of binding to a variant of Bruton's Tyrosine Kinase, said variant comprising an amino-terminal extension; b. contacting said cells with said ligand under conditions wherein said variant is detected.
 9. The method of claim 8, wherein said extension comprises an additional 34 amino acids.
 10. The method of claim 9, wherein said ligand binds to a portion of said 34 amino acid extension.
 11. The method of claim 10, wherein said ligand comprises an antibody or fragment thereof.
 12. A composition, comprising a variant of Bruton's Tyrosine Kinase comprising an amino-terminal extension.
 13. The composition of claim 12, wherein said extension comprises an additional 34 amino acids.
 14. A ligand-protein complex comprising antibody bound to said variant of claim
 12. 15. A kit for use in the method of claim 8, the kit comprising a ligand capable of binding to a variant of Bruton's Tyrosine Kinase and instructions for its use. 